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PROBLEMS IN PROBABILITY THEORY 


By Haratp CRAMER 
University of Stockholm 


1. Introduction. The following survey of problems in probability theory 
has been written for the occasion of the Princeton Bicentennial Conference on 
“The Problems of Mathematics,” Dec. 17-19, 1946. It is strictly confined to 
the purely mathematical aspects of the subject. Thus all questions concerned 
with the philosophical foundations of mathematical probability, or with its 
ever increasing fields of application, will be entirely left out. 

No attempt to completeness has been made, and the choice of the problems 
considered is, of course, highly subjective. It is also necessary to point out 
explicitly that the literature of the war years has only recently—and still far 
from completely—been available in Sweden. Owing to this fact, it is almost 
unavoidable that this paper will be found incomplete in many respects. 


I. FUNDAMENTAL NOTIONS 


2. Probability distributions. From a purely mathematical point of view, 
probability theory may be regarded as the theory of certain classes of additive 
set functions, defined on spaces of more or less general types. The basic struc- 
ture of the theory has been set out in a clear and concise way in the well-known 
treatise by Kolmogoroff [53]. We shall begin by recalling some of the main 
definitions. Note that the word additive, when used in connection with sets 
or set functions, will always refer to a finite or enumerable sequence of sets. 

Let w denote a variable point in an entirely arbitrary space Q, and consider 
an additive class C of sets in Q, such that the whole space Q itself is a member of 
C. Further, let P(S) be an additive set function, defined for all sets S belonging 
to the class C, and suppose that 


P(S) 2 0 for all S in C, 
P(Q) = 1. 


We shall then say that P(S) is a probability measure, which defines a probability 
distribution inQ. For any set S in C, the quantity P(S) is called the probability 
of the event expressed by the relation w C S, i.e. the event that the variable 
point w takes a value belonging to S. Accordingly we write 


P(S) = Pw C 8S). 

Suppose now that w’ = g(w) is a function of the variable point w, defined 
throughout the space Q, the values w’ being points of another arbitrary space 
Q’. Let S’ be a set in 2’ and denote by S the set of all points w such that w’ = 
g(w) belongs to S’. Whenever S belongs to C, we define a set function P’(S’) 
by writing 

P'S’) = P(S). 
165 
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It is then easy to see that P’(S’) is defined for all S’ belonging to a certain 
additive class C’ in the new space 2’, and that P’(S’) is a probability measure 
in 2’, such that P’(S’) signifies the probability of the event w’ C S’ (which is 
equivalent tow CS). We shall say that P’(S’) is attached to the probability 
distribution in 9’ which is induced by the given distribution in Q and the function 


, 


w’ = g(w). 


3. Random variables. Consider in particular the case when w’ is a real 
number &, such that £ = g(w) is a real-valued C-measurable function of the 
argument w. Then C’ includes the class B, of all Borel sets S’ of the space Q’ = 
R, of all real numbers, and we shall call ¢ a one-dimensional real random variable. 
The probability of the event — C S’ is uniquely defined for any Borel set S’ of 
R, , as soon as the function 


F(z) = P( < 2) 


is known for all real x F(x) is called the distribution function (df.) of the 
random variable é. If the function = g(w) is integrable over Q with respect 
to the measure P(S), we write 


oo 


Et = [ ed dP = [ _2dF(2), 


and denote this expression as the expectation or mean value of the random vari- 
able & Any real-valued B-measurable function » = h(é) is also a random 
variable with the probability distribution induced by the original w-distribution 
and the function 7 = h(g(w)). If 7 is integrable over Q with respect to P, its 
mean value may be written in the form 


En = Eh(é) = [ nw) dP = [ h(x) dF (2). 


More generally, if w’ = (&, ---, &,) is a point in an n-dimensional Euclidean 
space R, , while C’ includes the class B, of all Borel sets of R, , we are con- 
cerned with an n-dimensional real random variable. The distribution of this 
variable, which is also called the joint distribution of the m one-dimensional 
variables £ , --- , £2, is uniquely defined, as soon as the joint d.f. 


F(a, -°-+,%) = Pl: SM, °--, én S In) 


is known for all real x, ---,2n. 

The variables &, --- , &, are said to be independent, if F(a, --- ,%n) = Fi(x) 
--+ F,(x,), where F,(z,) is the d.f. of the variable &, . 

The extension to complex random variables is obvious. Suppose e.g. that 
— = g(w) and 7 = A(w) are two one-dimensional real variables, and consider 
the complex variable + in = g(w) + ih). By definition, we identify the 
distribution of this variable with that of the two-dimensional real variable 
(é, ), and we put 


E(é + in) = EE + iEn. 
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Joint distributions of several complex variables are introduced in a correspond- 
ing way. 


4. Characteristic functions. If £ is a one-dimensional real random variable, 
the mean value 


0 
7 zt : 7 
g(z) = Ee* = e* dF (x) 
I— 00 
exists for all real z, and we have 


| e(z)| <1, g(0) = 1. 


g(z) is called the characteristic function (c.f.) of the distribution corresponding 
to the variable ¢. The reciprocal formula (Lévy) 


. 1 : Z co — ev 

F(x) — F(y) = — —. lim | —_—___—. ¢(z) dz, 
272 z--0 J—z Z 

which holds for any continuity points x and y of F, shows that there is a one- 

one correspondence between the d.f. F(x) and the c.f. g(z). As we shall see 

below, the c.f. provides a powerful analytical tool for operations with prob- 

ability distributions. 

When a complex-valued function ¢g(z) of the real variable z is given, it is 
often important to be able to decide whether ¢(z) is or is not the c.f. of some 
distribution. If we assume a priori that g(0) = 1, each of the following condi- 
tions is necessary and sufficient for g(z) to be a c.f. 

A. ¢(z) should be bounded and continuous for all z, and such that the integral 


A A : ) 
[ | g(z — ue" dz du 
“0 0 


is real and non-negative for all real x and all A > 0 (Cramér [11], in simplifica- 
tion of an earlier result due to Bochner, [4]). 
B. There should exist a sequence of functions y(z), Ye(z), --- such that 


e(2) = lim [ a(x + 2)¥ala) dz 


holds uniformly in every finite z-interval (Khintchine, [45]). 

These general theorems are not always easy to apply in practice. Among 
less general results which are more easily applicable, we mention the almost 
trivial fact that a function ¢(z) which near z = 0 is of the form ¢g(z) = 1 + 0(2’) 
cannot be a c.f. unless g(z) = 1 for all z, and the two following theorems: 

1) An integral function ¢(z) of order y < 1 can never be a c.f. (Lévy, [64]), and 

2) an integral function ¢(z) of finite order y > 2 cannot be a c.f. unless the 
convergence exponent of its zeros is equal to y (Marcinkiewicz, [72]). The 
latter result shows e.g. that no function of the form e””, where g(z) is a poly- 
nomial of degree > 2, can be a c.f. 

It would be highly desirable to obtain further results in this direction. 
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The c.f. of the joint distribution of n real random variables £, --- , &: is the 
function g(z:, --- , Zn) defined by the relation 
g(21 ety Zn) re EBeittnia tes tent) 


Most of the above results for c.f. in one variable can be directly generalized 
to the multi-variable case. 


5. Random sequences and random functions. Let ¢ be a variable point in 
an arbitrary space T, and consider the space Q, where each point w is a real- 
valued function w = x(t) of the variable argument ¢. Let 4, ---, tr be any 
finite set of distinct points ¢. The set of all functions w = z(t) satisfying the 
inequalities 


a; < x(t;) 36;,gG= i,-+>,@), 


will be called an interval in the space Q. The Borel sets in Q will be defined as 
the smallest additive class B of sets in 2 containing all intervals. 

Suppose now that, for any chcice of n and the ¢;, the variables x(t), --- , x(t,) 
are random variables having a known n-dimensional joint distribution. If the 
family of all distributions corresponding in this way to finite sequences ¢,, 
--+, t, satisfies certain obvious consistency conditions, a fundamental theorem 
due to Kolmogoroff asserts that this family determines a unique probability 
distribution in the space © of all functions x(t). The corresponding probability 


P(S) = P(a(t) C S) 


is uniquely defined for all Borel sets S of Q. 

Consider in particular the case where T is the set of non-negative integers 
t = 0,1, 2,---. The space Q then is the space of all sequences (a, 21, ---) 
of real numbers. As soon as the joint distribution of any finite number of 
variables x,,, °°: , 2, is defined, and these distributions are mutually con- 
sistent, it then follows that there is a unique probability distribution of the 
random sequence (2%), %1, -*:), the corresponding probability being defined 
for every Borel set of the space 2 of sequences. Similarly we may consider the 
doubly infinite sequence (---, 21, %,%1, °°*). 

Consider further the more general case when T is any set of real numbers. 
Then Q is the space of all real-valued functions w = 2(¢) defined on the set T, 
and as before the knowledge of the distributions for all finite sets of variables 
x(t), «++, 2(t,) permits us to determine a probability distribution in the space 
Q of random functions x(t), the probability P(S) = P(a(t) C S) being always 
defined for all Borel sets S in Q. 

The generalization of the above considerations to complex-valued random 
sequences and functions is immediate. 


6. Various modes of convergence. Consider a sequence F(x), F(x), 
of d.f:s, and let the corresponding c.f:s be gi(t), ge(f), ---. In order that F,(z) 


we 


\w 
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converge to a d.f. F(x), in every continuity point of the latter, it is necessary 
and sufficient’ that ¢,() converge for every real ¢ to a limit g(t) which is con- 
tinuous at? = 0. Then g(t) is the c.f. corresponding to the d.f. F(x). 

Further, let x and 2, x2, --- be complex-valued random variables, such 
that the random sequence (x, 2; , %2, -:+) has a well defined distribution. We 
shall be concerned with various modes of convergence of x, to x. 

A. When P( |x, — x| > €) ~O0asn— ~, for any e > 0, we shall say that 
x, converges to x in probability. 

B. When E | x, — x |” +0, as n — x«, where y > 0 is fixed, we shall say that 
zr, converges to x in the mean of order y. Unless otherwise stated we shall in 
the sequel always consider the case y = 2, and in this case we shall use the 
notation 

lim. 2, = 2. 


no 


C. When P(lim x, = x) = 1, we shall say that x, converges with probability 


one, or converges almost certainly to x. 

With respect to the last definition, we may remark that the set defined by 
the relation lim 2, = 2 is always a Borel set in the space of our random sequence, 
so that the probability of this relation is well defined. In fact, this probability 
is given by the expression 


lim lim lim P(x, —-r\|< E for v=n,n+1,---,n +p) 
mo n—0o p—00 m 

where the limit process applies to a probability attached to a Borel set in a finite 
number of dimensions. The case of almost certain convergence is precisely 
the case when this expression takes the value 1. 

Convergence in the mean of any positive order, as well as almost certain 
convergence, both imply convergence in probability, which may be written 
symbolically B— A and C—A. Between B and C, there is no simple relation 
of this kind. Further, A and B both imply almost certain convergence for any 
partial sequence x,,, X.,, °°: such that the subscripts m;, increase sufficiently 
rapidly with k. 


Il. PROBLEMS CONNECTED WITH THE ADDITION OF 
INDEPENDENT VARIABLES 


7. During the early development of the theory of probability, the majority 
of problems considered were connected with gambling. The gain of a player 
in a certain game may be regarded as a random variable, and his total gain in a 


1 As I have already stated in a paper published in 1938, there is an error in the state- 
ment of this theorem given in my Cambridge Tract [9] Random Variables and Probability 
Distributions. For the truth of the theorem, it is essential that ¢,(t) should be supposed 
to converge to g(t) for every real t. However, in the particular case when the limit ¢(t) 
is analytic and regular in the vicinity of t = 0, it can be proved that it is sufficient to assume 
convergence in some interval | t | < a. 
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sequence of repetitions of the game is the sum of a number of independent 
variables, each of which represents the gain in a single performance of the game. 
Accordingly a great amount of work was devoted to the study of the probability 
distributions of such sums. A little later, problems of a similar type appeared 
in connection with the theory of errors of observation, when the total error was 
considered as the sum of a certain number of partial errors due to mutually 
independent causes. At first only particular cases were considered, but gradu- 
ally general types of problems began to arise, and in the classical work of Laplace 
several results are given concerning the general problem to study the distribvtion 
of a sum 


Zn = M+ oe + Za 


of independent variables, when the distributions of the 2; are given. This 
problem may be regarded as the very starting point of a large number of those 
investigations by which the modern Theory of Probability was created. The 
efforts to prove certain statements of Laplace, and to extend his results further 
in various directions, have largely contributed to the introduction of rigorous 
foundations of the subject, and to the development of the analytical methods. 
At the same time, more general types of problems have developed from the 
original problem, and the number and importance of practical applications 
have been steadily increasing. 

8. Composition of distributions. Let 2, and x2 be two independent variables, 
with the d.f.’s F; and F, , and the c.f.’s g; and ¢e , and let the sum 2 + 2 have 
the d.f. F and the c.f. ¢. Then 


F(z) = | Fy\(a — y) dF2(y) = | F(a — y) dFy(y). 


We shall say that F is the composition of F, and F, , and write this as a symbolical 
multiplication: 


F = Fi *F, = Fp, *F,. 


To this symbolical multiplication of the d.f:s corresponds a real multiplication 
of the c.f.’s: 


g(z) = gilz)go(z). 

The operation of composition is both commutative and associative, so that 
any symbolical product F = F, * F, --- * F,, is uniquely defined and independent 
of the order of the components. When at least one of the components is con- 
tinuous (absolutely continuous), the same holds for the composite, and in 
many cases it is true that the composite is at least as regular as the most regular 
of the components (Lévy, [58], [63], ete.). However, this general statement 
does not hold generally, as is shown by an interesting example due to Raikov, 
[77], where F; and F, are integral analytic functions, while the composite F = 
F,*F, is not regular at the origin. 

It seems to be an important unsolved problem to find convenient restrictions 
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ensuring the validity of the above statements of the “smoothing effect” of 
the operation of composition. 

When F = F, * F, , we may say that F is “divisible” by each component F; 
and F,, and it seems natural to try to develop a theory of symbolical factoriza- 
tion for d.f.’s._ In this connection, it is important to note that symbolical divi- 
sion is not unique. In fact, Khiatchine has shown by an example that it is | 
possible to find the d.f.’s F, F 1, fF, and Fs; such that SS 


F = F,+*F, = F,*F3, t \J 


while F, # Fs. Another fundamental problem belonging to this order of ideas 
is to decide whether a given d.f. F is decomposable or not. F is called decom- 
posable, if there is at least one representation of the form F = F; * F2, where 
each component F’, has more than one point of increase. So far, this problem 
has only been solved in very special cases, and the general problem still re- 
mains open for research. A particular case of some interest would be to know 
if there exists an absolutely continuous and indecomposable d.f., such that 
F(a) = 0 and F(b) = 1 for some finite a and b. 

As soon as we restrict ourselves to certain special classes of distributions, 
it is possible to reach results of a more definite character concerning the factori- 
zation problems. Some results of this type will be considered below. 


9. Closed families of distributions. The fact that certain families of dis- 
tributions are closed with respect to the operation of composition has played 
an important part in many applications. If F, and F, belong to a family of 
this character, so does the symbolical product F = F,* F.. We first give some 
simple examples of such families. 

The normal distribution. The d.f. F has the form F = +(2—), where 


o 
o > 0, and 


1 = 
¢(x) = va! co «a. 


miz—}o2 22 


The c.f. corresponding to F is e 
m; and any positive o1 , a2 we have 


(C=2\=>) <fe%, 
01 02 o 


2 2 2 
m= m + Mm, o =ot+o.2. 


, and it follows that for any real m, , 





where 


The Poisson distribution. Here the d.f. is F = F(x; \, m, a) where \ > 0, 


. . . : - ts . 
a ~ 0, and F is a step-function with a jump equal to —~¢€ in the point x = 
Vi 
rr . . iz+A iz—}) 
m + va, where vy = 0, 1, ---. The corresponding c.f. ise?" 
follows that for any fixed a we have 


F(x; 1, 7m, a) * F(x; do, m2, a) = F(x; yy + Ao, m + Mm. a). 


, and it 
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ao fF 
The Pearson Type III distribution. F = F(x; a, \) = Tr(a) | ree 
0 


~\ =k 
- ‘ is tz , 
(c > 0). The corresponding c.f. is (1 _ ) , and for any fixed a > 0 and any 
a ; 


positive \; and \» we have 
F(a; a, i) * F(a; a, re) = F(x; a, A + Ao). 


Stable distributions. We shall say that a closed family is stable, when all 
its members are of the form F(azx + b), where F is a d.f., while a > 0 and b are 
constants. Obviously the normal family is an example of a stable family. It 
has been shown by Lévy and Khintchine [49], that ad.f. (x) generates a stable 
family when and only when the logarithm of its ¢.f. is of the form 


(9.1) log g(z) = Biz — y |z|* (1 + 26 ra w), 
where a, 8, y, 6 are real constants such that 
0 <a S 2, y> 0, 181 @ i, 
while 
tg = for a #1 
o = “ 


© hw is] for a= 1. 
T 


For a = 2 we obtain the normal family. 

A more general and very important closed family is the family J of infinitely 
divisible distributions. A df. F belongs to J if to every n = 1, 2, --- there 
exists a d.f. G@ such that F = G'", where G'" denotes the symbolical nth power 
of G. Obviously the family J is a closed family which contains all the families 
mentioned above. Lévy [60], [63], has shown that F is infinitely divisible when 
and only when the logarithm of its ¢.f. is of the form 


oe wu 


0 
log y(z) = Biz — 27 + [ (« —1 





— — ) dM (u) 
1+ wv 


ae | (" —-l1l- =) dN(u), 
0 l + u- 


where 6 and y > 0 are real constants, while M(u) and N(u) are non-decreasing 
functions such that 


(9.2) 


M(—x) = N(+«) = 0, 


a 


0 
| u’dM(u) < « and I u’ dN(u) < 
—a 0 


for any finite a > 0. When M and N reduce to zero, we obtain the normal 
family. When y = 0 and one of the functions M and N reduces to zero, while 
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the other is a step-function with a single jump equal to \ at the point z = a, 
we obtain a Poisson family. Generally, it follows from (9.2) that any infinitely 
divisible distribution may be regarded as a product of a normal distribution 
and a finite, enumerable or continuous set of Poisson distributions. 

The representation of log y(z) in the form (9.2) is unique. It follows that 
the problem of finding all possible factorizations of an infinitely divisible df. F 
can be completely solved, as long as we restrict ourselves to factors which are 
themselves infinitely divisible. In fact, in order that 


F= F,* F2, 


where all three d.f.’s belong to J, it is necessary and sufficient that the logarithms 
of the corresponding c.f.’s should be of the form (9.2), with 


B=fiit+ &, Y=n t+, 
M=M,+ ™, N = N+ Nz. 


In the two simple cases of the normal and the Poisson distributions, the 
decompositions obtained in this way remain the only possible, even if we remove 
the restriction that the factors should belong to 7. Thus in any factorization 
of a normal distribution, all factors are normal (Cramér, [8]), while in any fac- 
torization of a Poisson distribution, all factors belong to the Poisson family 
(Raikov, [75]). For the type III distribution, and the non-normal stable dis- 
tributions, however, the corresponding property does not hold. 

In some cases, an infinitely divisible distribution may be represented as a 
product of indecomposable distributions, or as a product of an indecomposable 
distribution and another infinitely divisible distribution. The results so far 
obtained in this direction (Lévy, [63], [64], Khintchine, [46], [47]; Raikov, [76]) 
are all concerned with more or less particular cases, and the general factoriza- 
tion problem for infinitely divisible distributions still remains unsolved. A 
particular case of some interest would be the case when the functions M and N 
are both absolutely continuous. There does not seem to have been given any 
example of this type, where a factor not belonging to J may occur.’ 

Finally we mention a general theorem due to Khintchine, [46], which asserts 
that an arbitrary d.f. F may be represented in one of the forms 


F = G, F = HorF = GeH, 


where G is infinitely divisible, while H is a finite or infinite product of inde- 
composable factors. This seems to be practically the only result so far known 
concerning the factorization of a general distribution. 

A certain number of the results mentioned above have been generalized to 
multi-dimensional distributions. 


2While the present paper was being printed, I have proved that such factors do occur, 
as soon as at least one of the derivatives M’ and N’ is bounded away from zero in some 
interval (—a, 0) or (0, a). 
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10. The Laws of large numbers. In modern terminology, the classical 
Bernoulli theorem may be expressed in the following way. Let 7, 2, --- be 
a sequence of independent variables, such that each x, may only assume the 
values 1 and 0, the corresponding probabilities being p and g = 1 — p. Then 
the arithmetic mean 


n n 


(10.1) Zn oH + ee + Zn 


converges in probability to p, asn > ©. 

Both classical and modern authors have laid down much work on the gen- 
eralization of this simple result in various directions. Generally, we shall say 
that a sequence of random variables 2, 72, --- satisfies the Weak Law of Large 
Numbers if there exist two sequences of constants a, a2, -:: and b, b 
such that a, > 0, and 


25 me ON 


RH -& a+ +2, ~6, 


an An 





converges in probability to zero. 

Let 2x, , %2--+ be independent variables, such that xz, has the df. F,(z). 
It has been shown by Feller [27] that for any given sequence a,, d2,---, the 
conditions 


Sf arte) = ot), 
|z| >a@n 


val 


Vf sara) = o(d.), 
|z|<a@n 


v1 


(10.2) 


are sufficient for the validity of the weak law of large numbers, and that the 
corresponding sequence b; , bs , --- can be defined by 


b, = Li x dF, (x). 
val 4|z|<an 


When there is a constant c > 0 such that for all v 
(10.3) Fi(+0) > c, F,(-—0) < 1 —- ¢, 


the conditions are also necessary. This theorem contains as particular cases 
all previously known results in this direction. A simple NS condition for the 
existence of at least one sequence a; , a2, --- such that 10.2 holds does not seem 
to be known. 

When the weak law is satisfied, this means that, for any given e > 0 and for 
any fixed large n, there is a probability very near to 1 that the sum z, = a1 + 
-++ + 2, will fall between the limits b, + ea,. The more stringent condition 
that, with a probability tending to 1 as n — «<«, z, will fall between the limits 


. . . - Zn — b 
b, + ea, for all values of v = n is equivalent to the condition that —“—_—* con- 


n 
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verges almost certainly to zero. When this holds, we shall say that the variables 
z, satisfy the Strong Law of Large Numbers. The most important result so far 
known in this connection is concerned with the case a, = n, and is expressed 
by the following theorem (Kolmogoroff, [52], [55]): 

When the x, are independent and (10.3) holds, a sufficient condition for the valid- 


ity of the strong law with a, = n consists in the simultaneous convergence of the 
two series 


= 2° dF,(2). 
|z|<n 


M 


> / dF,,(2) and 
|z|>n 


Some improved conditions of this type have been given by Marcinkicwicz 
and Zygmund, [73], but the problem of finding a NS condition for the strong 
law is still unndived. even in the case a, = n. 

Important generalizations of the laws of large numbers to cases when the 


z, are not assumed to be independent have been given i.a. by Khintchine 44], 
Lévy [62], [63] and Loéve [67]. 


11. The central limit theorem and allied theorems. It was already known 
to De Moivre that, in the case 10.1 of the Bernoulli distribution, the d.f. of 
the normalized sum 


1 1 --: +t — © 
V npg 
tends, as n — «, to the normal d.f. (x). Considerably more general results 
in this direction were stated by Laplace. After a long series of more or less 
successful attempts, a rigorous proof of the main statements of Laplace was 
given in 1901 by Liapouncff, [65]. More general cases were later considered i.a. 
by Lindeberg [66], Lévy [61], [63], Khintchine [43] and Feller, [25]. The follow- 


ing final form of the Central Limit Theorem is due to Feller. 
Consider the expression 


(11.1) = en — bn _ MH $s ttn — dy 


an An 


where the 2, are independent variables. We shall say that the x, obey the 
central limit law, if the sequences {a,} and {b,} can be found such that the 
d.f. of u, tends to ¢(x) as n — «. In order to avoid unnecessary complica- 
tions, we shall restrict ourselves to sequences {a,} such that 


Qy+1 
—_ 1, 
a, 


%—> + 2, 


and we shall assume that the conditions (10.3) are satisfied. Then Feller’s 
theorem runs as follows: 
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The independent variables x, , x2, --- obey the central limit law 7f, and only if, 
there exists a sequence qn — © such that simultaneously 


Lara) 0, 
|z|>an 


vel 


(11.2) . 

. > | a’ dF (x) > ~. 

Tn vel 4|z|<an 
When these conditions are satisfied, explicit expressions for the a, and b, can be 
obtained. . 

Feller’s theorem gives a complete solution of the problem. However, we 
might still try to express in a more direct way the condition that the q, should 
exist. We may also ask what happens when the conditions (11.2) are not 
satisfied. Some particular cases of the latter question will be considered below. 
However, very few general results are known in this direction. 

The central limit theorem has been extended in various directions. Bern- 
stein [3], Lévy [62], [63], Loéve [67] and others have considered cases where the 
x, are not assumed to be independent. Important results have been reached 
but still much remains to be done. 

On the other hand, several authors have considered symmetrical functions, 
other than sums, of n independent random variables. The problem of investi- 
gating the asymptotic behaviour of the distributions of such functions, as n 
tends to infinity, is of great importance in the theory of statistical sampling 
distributions. It is known (c.f. e.g. Cramér, [15]) that under certain general 
regularity conditions there exists a normal limiting distribution. However, it 
is also known that it is possible to give examples of particular functions (such 
as e.g. the function which is equal to the largest of the n variables), where there 
exist limiting distributions which are non-normal. The conditions under 
which this phenomenon may occur seem to deserve further study. 

A further problem belonging to the same order of ideas is to find a closer 
asymptotic representation of the d.f. of the standardized sum z, than that pro- 
vided by the normal function ¢(x). Consider e.g. the simple case when the 2, 
are independent variables all having the same d.f. F(x) with a finite mean m, a 
finite variance r and finite moments up to a certain order k = 3. Let G,(z) 
be the d.f. of the variable 





It then follows from a theorem of Cramér [5], [9] that, as soon as the d.f. F(x) 
contains an absolutely continuous component, there is an asymptotic expansion 


k-3 , 
‘ vi) —x2/2 —(k—2)/2 
(11.3) G(x) = ¢(x) + > = oo +o, 
vel i 
where the constant implied by the O is independent of n and x. Cramér has 
also given similar expansions in more general cases, and his results have been 
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further extended by P. L. Hsu [39], who deduces analogous expansions also for 
other functions than sums. The most general conditions under which expansions 
of this type exist are still unknown. 

It follows from (11.3) that the difference G,(x) — 9$(x) is, for any fixed z, 
of the order n*? as n > &. It is often important to know the asymptotic 
behaviour of G,(a2) when n and x increase simultaneously, and in that case (11.3) 
yields only a trivial result. This case has been investigated by Cramér [10], 
and Feller [29], and the results so far obtained permit important applications to 
the so called law of the iterated logarithm (cf. below). However, it seems likely 
that similar results may be obtained in considerably more general cases than those 
hitherto investigated. 

A further interesting type of problems belonging to this order of ideas may 
be approached in the following way. Consider the variables (11.1) in the par- 
ticular case when 2;, %2, --- are independent variables all having the same 
df. F(x). When the a, and b, can be found such that the d.f. of the normalized 
sum u, tends to ¢(x), we shall say that F belongs to the domain of attraction of 
the normal law. Feller’s theorem gives a NS condition that this should be so. 
Now when this condition is not satisfied, it may still occur that the a, and b,, 
can be so chosen that the df. of wu, tends to a limiting d.f. ¥(x), which is neces- 
sarily different from ¢(x). Then it is easily seen that W(x) must be a stable 
distribution, with its c.f. defined by (9.1), and it is natural to say that F belongs 
to the domain of attraction of ¥Y. NS and sufficient conditions that this should 
hold have been given by Doeblin [16], and Gnedenko [34]. When the a, and 
b, cannot be found such that the d.f. of the normal sum u, converges to a limit, 
it may still be possible to obtain a limiting d.f. by considering only a partial 
sequence U,,, Un,, ***- Khintchine [47] has proved the interesting theorem 
that the totality of limiting d.f.’s that may be obtained in this way coincides 
with the class of infinitely divisible d.f.’s defined by (9.2). There are also 
further results in the same direction given by Bawly [2], Khintchine [44], Lévy, 
(61]-[63], and Gnedenko, [85]. 


12. The law of the iterated logarithm. Consider a sequence of independent 
variables 2; , %, --- , such that the mean Ez, = 0 for all n, while the variances 
Wx? = o, are finite. Puts, = o + --- +7, and suppose that the variables 
obey the central limit law with a, = s,, b, = 0. (In particular this will be 
the case when all x, have the same distribution.) For any function y(n) tending 
to infinity with n we then have 


(12.1) lim P(| z,| > sry(n)) = 0. 


no 


On the other hand, if y(n) tends to a finite limit > 0, the same probability 
has a positive limit. 

It seems natural to consider the relation within the brackets in (12.1) not 
only for a single large value of n, but to require the probability that this relation 
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holds simultaneously for an infinite number of values of n. The development 
of this problem has led to the so called law of the iterated logarithm. 

We shall in this respect use the following terminology due to Lévy. A non- 
decreasing positive function y(n) will be said to belong to the lower class with 
respect to the variables x, if, with a probability equal to one, there are infinitely 
many 7 such that 


|zn| > snv(n). 


On the other hand, ¥(n) will be said to belong to the upper class if the prob- 
ability of the same property is equal to zero. 

Every y(n) belongs to one of these two classes. This is a special case of the 
so called null-or-one law: if S is a Borel set in the space of the independent random 
variables 2; , %2, --- , such that any two points differing at most in a finite num- 
ber of coordinates either both belong to S or both belong to the complementary 
set, then P(S) can only assume the values 0 or 1. 

It was proved by Kolmogoroff [51] that, subject to certain restrictions, the 
function 


Y(n) = Vc log log s, 
belongs to the lower class for any c < 2, and to the upper class for any c > 2, 
which may be expressed by the relation 


. 2 
(12.1) F (im sup ——_——_————- = 1) = 1. 
Sn V2 log log sp 
More general results were proved by Feller [80], who proved i.a. that, subject to 
certain restrictions, y(n) belongs to the lower or upper class according as 
2 
~ on —(p2(n)/2) 
(12.2) ys une 
e:, 
is divergent or convergent (in certain special cases, this had been previously 
found by Kolmiogoroff and Erdés [24]. Feller also proved a more compli- 
cated result, which contains the above as a particular case, and from which 
it follows that the simple criterion (12.2) no longer holds when the restrictions 
imposed in its proof are removed. 


13. Convergence of series. [For any sequence of random variables z,, the 
probability 


0 
P (= Za converges) 
1 


has a uniquely determined value. When the x, are independent, it follows from 
the null-or-one law that this probability is either 0 or 1. By a theorem of 
Khintchine and Kolmogoroff [48], the value 1 is assumed when and only when 
the three series 
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are convergent, where 


x, When |a,| £1. 
Yr = 
0 when |2z,|> 1. 
For the case when the x, are not assumed to be independent, various results 
have been given by Lévy [63] and others, but our knowledge of the properties 
of these series is still not very advanced. 


14. Generalizations. In several instances it has been pointed out above 
that the results concerning sums of independent variables may, to a certain 
extent, be extended to cases when the variables are not independent. Generally 
the independence condition has then to be replaced by some condition restricting 
the degree of dependence. Results of this type were first give by Bernstein 
[3], and then in more general cases by Lévy [62], [63], and Loéve [67]. However, 
this field has so far only been very incompletely explored. 

Similar remarks apply to the generalization of the various theorems quoted 
above to cases of variables and distributions in more than one dimension. 


III. STOCHASTIC PROCESSES 


15. The theory of random variables in a finite number of dimensions is able 
to deal adequately with practically all problems considered in classical prob- 
ability theory. However, during the early years of the present century, there 
appeared in the applications various problems, where it proved necessary to 
consider probability relations bearing on infinite sequences of numbers, or even 
on functions of a continuous variable. 

The mathematical set-up required for the study of such problems involves 
the introduction of probability distributions in spaces of random sequences or 
random functions (cf. 5 above). Generally, any process in nature which can be 
analyzed in terms of probability distributions in spaces of these types will be 
called a stochastic process. It is convenient to apply this name also to the prob- 
ability distribution used for the study of the process. We shall thus say, e.g., 
that a certain random function x(t) is attached to the stochastic process which 
is defined by the probability distribution of x(t). In the majority of applica- 
tions, the variable ¢ will represent the time, and we shall often use a terminology 
directly referring to this case. However, there are also other types of problems 
in the applications (¢ may e.g. be a spatial variable in an arbitrary number of 
dimensions), and it is obvious that the purely mathematical problems connected 
with these classes of probability distributions will have to be considered quite 
independently of any concrete interpretation of the variable ¢ or the funcion z(t). 

A well-known example of this type of problems is afforded by the Brownian 
movement. Let x(t) be the abscissa at the time ¢ of a small particle immersed 
in a liquid, and subject to molecular impacts. In every instant, the quantity 
x(t) receives a random impulse, and the problem arises to study the behaviour 
of x(t). According as we are content to consider x(¢) for a discrete sequence 
ot t-points, say for ¢ = 0, 1,2, --- , or we wish to consider all positive values of ¢, 
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we shall then have to introduce a probability distribution in the space of the 
random sequence x(0), x(1), ---, or in the space of the random function z(é) 
where ¢ > 0. We may then discuss such questions as the distribution of 2(¢) 
for a given value of ¢, the joint and conditional distributions of x(t) for two or 
more values of ¢, and, in the case of a continuous variable ¢, continuity, differen- 
tiability and other similar properties of the random function 2(é). 

Wiener [82], [83] (cf. also Paley and Wiener [74]) was the first to give a rigorous 
treatment of this process. He proved in 1923 that it is possible to define a 
probability distribution in a suitably restricted functional space, such that the 
increment Ax(t) = x(t + At) — 2(é) is independent of x(t) for any At > 0, 
With a probability equal to 1, the function 2(é) is continuous for all ¢ > 0, and 
for any fixed ¢ > 0, the random variable x(¢) is normally distributed. 

Another example of stochastic processes studied at this stage occurs in the 
theory of risk of an insurance company. Let x(¢) denote the total amount 
of claims up to the time ¢ in a certain insurance company. As in the case of 
the Brownian movement, it may seem natural to assume that the increment 
Ax(t) is independent of x(#). On the other hand, x(é) is in this case an essen- 
tially discontinuous function, which is never decreasing, and increases only by 
jumps of varying magnitudes occurring for certain discrete values of ¢, which are 
not a priori known. Processes of this type were studied by F. Lundberg [69], 
[70], H. Cramér [6] and others. 

Further examples of particular processes were discussed in connection with 
various applications, but no general theory of the subject existed until 1931, 
when Kolmogoroff published a basic paper [53] dealing with the class of stochastic 
processes which will here be denoted as Markoff processes (Kolmogoroff uses the 
term “stochastically definite processes”), of which the two examples mentioned 
above form particular cases. The theory of this class of processes was further 
developed by Feller [26], [28]. In 1934, Khintchine [42] introduced another 
important class of processes known as stationary processes. From 1937, the 
general theory of the subject was subjected to a penetrating analysis in a series 
of important works by Doob [18]-[22].* 


16. Probability distributions in functional spaces. We have seen in 5 

above how a probability distribution in the space of all functions x(¢) may be 
defined, when ¢ varies in an arbitrary space T. Generally, we shall here con- 
tent ourselves to consider the cases when T is the set of all real numbers, or the 
set of all non-negative real numbers. Most results obtained for these cases 
will be readily generalized to cases when ¢ varies in a Euclidean space of a finite 
number of dimensions. On the other hand, when T is enumerable, say consist- 
ing of the points ¢ = 0, +1, +2, ---,so that we are concerned with a random 
sequence x(0), x(+1), --- , the results for the continuous case will generally 
hold and assume a simpler form which will not be particularly stated here. 





3A further interesting paper by Doob has appeared while the present paper was being 
printed: “Probability in function space’’, Bull. Amer. Math. Soc., Vol. 53 (1947), pp. 15-30. 
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The case when T is a space of an infinite number of dimensions does not seem 
to have been considered so far. 

In the present paragraph, it will be convenient to assume the function x(¢) 
to be real-valued, but the generalization to a complex-valued x(t) requires 
only obvious modifications. In the sequel we shall sometimes consider the 
real-valued and sometimes the complex-valued case, according as the occasion 
requires. 

Let now X be the space of all real-valued functions x(¢) of the real variable 
t, where —*© <t< «©. According to 5, a probability measure P(S) is uniquely 
defined for all Borel sets S in X by means of the family of joint distributions 
of all finite sequences x(é,), --- , x(tn). In fact, PGS) can be defined for a more 
general class of sets than the Borel sets. For any set S in X, we may define 
an outer P-measure P(S) as the lower bound of P(Z) for all sums Z of finite or 
enumerable sequences of intervals, such that S C Z. Further, the inner P- 
measure P(S)is defined by the relation P(S) = 1 — P(X — S). When the 
outer and inner measures are equal, S is called P-measurable, and P(S) is defined 
as their common value. Any P-measurable set differs from a Borelset by a 
set of P-measure zero. 

In many cases, this definition will be sufficient for an adequate treatment 
of the problems that we wish to consider. However, in other cases we encounter 
certain characteristic difficulties, which make it desirable to consider the pos- 
sibility of amending the basic definition. Thus it often occurs that we are 
interested in the probability that the random function x(t) satisfies certain 
regularity conditions in a non-enumerable set of points ft. We may, e.g., wish 
to consider the probability that 2(¢) is continuous for all ¢, that x(t) should 
be Lebesque-measurable for all ¢, that 2(¢) < k for all t, ete. Let S denote the 
set of all functions satisfying a condition of this type. It can then be shown 
that the inner measure P(S) is always equal to zero so that S is never measur- 
able, except in the (usually trivial) case when P(S) = 0. 

Consequently many interesting probabilities are left undetermined by the 
general definition of a probability distribution in X given above. The pos- 
sibility of modifying the definition so as to enable us to study probabilities of 
this type has been thoroughly investigated by Doob [18]. He considers a 
subspace Xp of the general functional space X, where Xo is chosen so as to 
contain only, or almost only, “desirable” functions, i.e. functions satisfying 
such regularity conditions as seem natural with respect to the problem under 
investigation. We start from a given probability measure P(S) in X, and ask 
if it is possible to define a probability measure in the restricted space Xo , which 
corresponds in some natural way to the given distribution in X. Let So be 
a set in Xp, and suppose that it is possible to find a P-measurable set S in X 
such that SX, = S,. According to Doob, a probability measure Py in Xo 
is then uniquely defined by the relation 


Po( So) — P(S) 
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if and only if the condition 
is satisfied. 

The problem is thus reduced to finding a subspace Xo of outer P-measure 1, 
such that X, contains only functions of sufficiently regular behaviour. When 
this can be done, we can restrict ourselves to consider only functions 2(é) be- 
longing to X,, the probability distribution in this space being defined by the 
measure Py. We shall then say that x(t) is a random function, attached to a 
stochastic process with the restricted space X». Doob has obtained a great 
number of interesting results in this connection, e.g. with respect to the problem 
of choosing Xo such that it contains almost only Lebesque-measurable functions, 
or such that the probability of the relation x(t) S k has a well-defined value for 
all k. In particular he has shown that the last problem can be solved for 
any given P-measure. However, our knowledge cf the various possibilities 
which exist with respect to the choice of X, is still very incomplete, and it seems 
likely that further important results may be reached along this line of research. 

An alternative method of introducing probability distributions in functional 
spaces has been used by Wiener [82], [83], (cf. also Paley and Wiener, [74]). 
Consider a given probability measure II in an arbitrary space Q, defined for all 
sets = of an additive class C. Let x(t, w) denote a function (real- or complex- 
valued, as the case may be) of the arguments ¢ (real) and w (point in Q), such that 
x(t, w) for every fixed ¢ becomes a C-measurable function of w. On the other 
hand, when w is fixed, x(¢, w) = x(t) reduces to a function of the real variable ¢. 
Let X> denote the set of all functions x(¢) corresponding in this way to points of 
Q. Further, let So = SX 9, where S is a Borel set in X, and let = denote the set 
of all points w such that x(t, w) C Sy. Then = belongs to C, and a probability 
measure P, in the functional space Xo is uniquely defined by the relation 


(16.1) Po(So) = T(z). 


The relations between the two modes of definition have been discussed by 
Doob and Ambrose [23] who have shown that they are largely equivalent. 
However, it seems likely that in particular problems the one or the other pro- 
cedure may sometimes be the more advantageous, and further investigations 
on this subject seem desirable. 


17. Processes with a finite mean square. Consider a stochastic process 
defined by a probability measure P(S) in the space X of all complex-valued 
functions x(t) of the real variable ¢. For any fixed é), the random variable 
x(t)) is then a complex-valued function of the variable point x(¢) in the space 
X, i.e. a point Q,, in the space Q of all complex-valued functions defined on X. 
When é varies, the point Q,, describes a “curve” in Q, which then corresponds 
to our stochastic process. 
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Suppose, in particular, that the mean square 
E\x@) = | |e fap 
x 


is finite for any fixed value of ¢. This implies that for fixed ¢ the function 
a(t) belongs to L, over X, relative to the probability measure P. The random 
variable x(t) may then be regarded as an element of the Hilbert space H of all 
complex-valued functions f belonging to L, over X, the inner product (f, g) of 
two elements f and g being defined by the relation 


(9) = | foaP = E(I0). 


The stochastic process to which 2(¢) is attached then corresponds to a ‘‘curve”’ 
in H (Kolmogoroff, [56], [57]), so that the well-known theory of Hilbert space is 
available for the study of the process. In particular, convergence in the usual 
metric of Hilbert space is equivalent to convergence in the mean of order 2 for 
random variables. 

Let H, be the smallest closed linear subspace of H which contains all elements 
of the form ayx(t;) + --- + a,x(t,). If the covariance function 


r(t,u) = (x(t), x(u)) = E(x()zx(u)) 


is continuous for all real values of ¢ and u, then x(t) — x(t) in the mean, as 
t — t , and we shall say that the process x(é) is continuous. For any continuous 
process, H, is separable. When g(é) is a continuous non-random function of ¢, 
and x(t) is attached to a continuous stochastic process, the Riemann-Darboux 
sums formally associated with the integral 


b 
[ g(t)x(t) dt 


are easily shown to tend to a limit y, which is an element of H, , i.e. a random 
variable. By definition, we may identify the integral with this variable y, 
and this integral will possess the essential properties of the ordinary Riemann 
integral (Cramér, [12]). 

The application of the theory of Hilbert space to stochastic processes seems 
to open very interesting possibilities. Some applications to particular classes 
of stochastic processes will be mentioned below. Futher important results be- 
longing to this order of ideas will be given in a work by K. Karhunen [40], which 
is in course of publication. 


18. Relations to ergodic theory. There is a close connection between the 
theory of stochastic processes and ergodic theory. In ergodic theory, as sum- 
marized e.g. in the treatise of E. Hopf [88], we consider an arbitrary space Q, 
and a probability measure II, defined for all sets = belonging to the additive 











184 HARALD CRAMER 


class C. We further consider a one-parameter group of one-one transformations 
of Q into itself (a ‘‘flow” in Q) such that the transformation corresponding to 
the parameter value ¢ takes the point w = w into w;, while (wi)u = wriu. Let 
f(w) be a given function, defined throughout 2, and such that f(w,) is C-measur- 
able for every fixed ¢. The well-known ergodic theorems due to von Neumann, 
Birkhoff, Khintchine and others are then concerned with the asymptotic 
behaviours of mean values, which in the classical cases are of the types 


f(wo) + flor) + +++ + flen-) 


n 





or 
1 , 
l Sls) dt, 


as n or T' tends to infinity. (In the case of the latter expression, it is necessary 
to introduce some additional condition implying measurability in ¢.) 

Writing x(t, w) = f(w,), it is seen that to a given transformation group w — w, 
and a given function f(w), there corresponds a stochastic process in the sense of 
Wiener’s definition (cf. 16). The space Xo of this process consists cf all functions 
x(t) representable in the form 2(t) = f(w:), when w = wo varies over 2. The 
corresponding probability measure P, is defined by (16.1). 

Thus any of the above-mentioned ergodic theorems may be expressed as a 
theorem concerning “temporal”? mean values of the types 


x(0) + 2) +°-- + 2m — 1) 


n 


1 T 
>| x(t) dt. 


If, according to some reasonable convergence definition, we may assign a limit 
to either of these expressions, as n or T tends to infinity, this limit will be a 
random variable, and it is important to find conditions which imply that this 
variable has a constant value for “almost all” functions 2(/), i.e. for all a(é) 
except at most a set of Po-measure zero. 

In the particular case when 2x(0), x(1), --- are independent variables all 
having the same distribution, the classical ergodic theorems yield simple cases 
of the laws of large numbers (cf. 10). The mean ergodic theorem of von Neu- 
mann gives the weak law, while the Birkhoff-Khintchine theorem gives the 
strong law. Some more general results belonging to this order cf ideas will be 
mentioned in the sequel. 

It will be seen that the two theories are largely equivalent, and it seems 
likely that further comparative studies of the methods will be of great value to 
both sides. 





or 


19. Markoff processes. Consider now a stochastic process, defined by a 
probability measure P(S) in the space X of all real-valued functions x(t) of the 
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real variable ¢. For any t; < t, there is a certain conditional probability 
P(a(t) C S| x(t) = a) of the relation x(f) C S, relative to the hypothesis 
that «(t,;) assumes the given value a;. Suppose now that this conditional prob- 
ability is independent of any additional hypothesis concerning the behaviour of 
a(t) for t < t,, so that we have e.g. for any & < 4: < & and for any a 


P(x(h) © S| x(h) = a) = Pee) C S| ah) = a, (to) = ap). 


In this case the process is called a Markoff process. 

The general theory of this type of processes, which forms a natural gen- 
eralization of the classical concept of Markoff chains, has been studied in basic 
works by Kolmogorcff [53] and Feller [26], [28]. Writing 


P(x(t) S &| x(t) = ace) = FE; t, ao, b), 


where t < ¢, F will be the distribution function of the random variable z(t), 
relative to the hypothesis x(%) = ao. Then F satisfies the Chapman-Kol- 
mogoroff equation 


(19.1) F(E; t, ao, to) - | F(€; t, n, 4) d, F(n; t, ao, to), 


which expresses that, starting from the state x(4) = ao, the state 2(é) S & 
must be reached by passing through some intermediate state z(t;) = 7, where 
tp < 4 < t. Subject to certain general conditions, it is pcssible to show that 
any solution of this equation satisfies certain integro-differential equations, 
which in some important cases reduce to partial differential equations of para- 
bolic type, and that the d.f. F is uniquely determined by these equations. How- 
ever, the general conditions mentioned above are in many cases difficult to apply 
to particular classes of processes, and it would be important to have further 
investigations concerning these questions. 

Markoff processes (not belonging to the subclass of differential processes, 
which will be considered in the following paragraph) appear in several important 
applications, e.g. in the theory of cosmic radiation, in certain genetical problems, 
in the theory of insurance risk etc. In these cases, we are often concerned with 
the class of purely discontinuous Markoff processes, where the function x(é) 
only changes its value by jumps. If, in addition, there are only a finite or 
enumerable set of possible values for x(t), the Chapman-Kolmogorcff equation 
(19.1) reduces to 


(19.2) mirlto, t) = Dy mis(to, h)mjelh, 2), 


where w:(é , ¢) denotes the “transition probability’’, i.e. the probability that 
x(t) will be in the kth state at the time ¢, when it is known to have been in the 
ith state at the time f. In matrix form, this equation may be written 


(19.3) II (to ’ t) re IT (to ’ 4) (4 ’ t), 


where II denotes the matrix of the 7x . 
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When only a sequence of discrete values of ¢ are considered, we have here 
the classical case of Markoff chains, which has received a detailed treatment 
in the well-known book by Fréchet [32] (cf. also Doob, [19]). The case when t 
is a continuous variable has been treated by Feller [28], O. Lundberg [71], 
Arley [1], and other authors. Some of the most important problems of this 
branch of the subject are concerned with the existence of a unique system of 
solutions of (19.2) or (19.3), and with the asymptotic behaviour of the solu- 
tions for large values of t — é. Though important results have been reached, 
there still remains much to be done here, and the same thing holds a fortiori 
with respect to the analogous problems for general Markoff processes. 


20. Differential processes. A particularly interesting case of a Markoff 
process arises when, for any Af > 0, the increment Az(t) = x(t + At) — a(t) 
is independent of x(7) for 7 S$ ¢. The process is then called a differential process. 
Some of the earliest studied stochastic processes belong to this class, which 
contains in particular the two examples discussed above in 15. Further cases 
of such processes arise e.g. in the theory of radioactive disintegration and in 
telephone technique. 

Let us suppose that (0) is identically equal to zero, and that the process is 
uniformly continuous in probability in every finite interval 0 S ¢ S Tyie. 
that for any fixed positive « 


P(| a(t + At) — 2() | > 2) > 0 


as At — 0, uniformly for 0 <¢ < 7. Then it follows from the works of Lévy, 
[60], [63], Khintchine [47] and Kolmogoroff [54] that, for any ¢ > 0, the random 
variable x(t) has an infinitely divisible distribution, with a characteristic fune- 
tion ¢(z; ¢) given by (9.2), where 8, y, 4/(u) and N(w) may depend on ¢. 

In the particularly important case when the distribution of the increment 
a(t + At) = x(t) does not involve /, but only depends on the length At of the 
interval, we say that the process is temporally homogeneous, and in this case 
we have 


log g(z; t) = tlog g(z; 1), 


so that we obtain the general formula for g(z; 4) simply by replacing in (9.2) 
8B, y, M(u) and N(u) by #6, fy, 1. (u) and tN (u) respectively. 

When ¢ — «, or ¢ — 0, the appropriately normalized distribution of x(¢) 
tends, under certain conditions, to a stable distribution (Cramér [7], Gne- 
denko [36]). When this limiting distribution is normal, there are sometimes 
even asymptotic expansions analogous to (11.3). Still, the problem of the 
asymptotic behaviour of the distribution for large ¢ does not seem to be definitely 
cleared up. 

Khintchine [41] and Gnedenko [37] have given interesting generalizations 
of the law of the iterated logarithm (cf. 12) to processes of the type considered 
here. 


= 
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The continuous process discussed in 15 in connection with the Brownian 
movement corresponds to the temporally homogeneous case when 8, M(u) and 
N(u) all reduce to zero, so that 


g(z) =e, 


which shows that the distribution of z(¢) is normal, with mean zero and vari- 
ance 27t. 

On the other hand, in the applications to the theory of insurance risk, y is 
zero, While M(u) and N(u) are connected with the distribution of the various 
magnitudes of claims. In this type of applications, it is often very important 
to find the probability that x(¢) satisfies an inequality of the form 


a(t) << a+ bt 


for all values of ¢. It follows from the discussion in 16 that the definition of 
a probability of this type is somewhat delicate. The problem, which can be 
regarded as an extended form of the classical problem of “the gambler’s ruin,” 
has been solved in certain particular cases. It leads to integral equations, 
which in the simplest case are of the Volterra, in other cases of the Wiener- 
Hopf type (Cramér [6], [13], Segerdahl [79], Tiacklind [81]). 


21. Orthogonal processes. Consider now the case of a complex-valued 
a(t), and suppose that E | x(t) |” is finite for all ¢. Without restricting the gen- 
erality, we may assume that Ex(t) = 0 for all ¢. 

Suppose now that instead of requiring, as in the case of a differential process, 
that the variables z(7) and Az(t) should be independent when + S t, we only 
lay down the less stringent condition that these variables should be non-cor- 
related, i.e. that 


E(x(r)Az()) = 0. 


We then obtain a process which is no longer necessarily of the Markoff type. 
The condition implies that, for any two disjoint intervals (¢,, é&) and (ts, t), 
we have 


E[(x(t2) — x(t))(z(4) — x(t))] = 0, 


so that the “chords” corresponding to two disjoint “arcs” of the curve in 
Hilbert space representing the process are always orthogonal (Kolmogoroff 
[56], [57]). A process of this type may accordingly be called an orthogonal 
process. 

For a process of this type we have, writing FE | 2(¢) |? = F(), F(t + Ad — 
F(t) = E| x(t + At) — x(t) |’, so that F(d) is a never decreasing function of t. 
If F(t) is bounded for all ¢, we shall say that the orthogonal process is bounded. 
For a bounded orthcgonal process, the Stieltjes integral 


[ g(t) dx(t), 
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where g(t) is bounded and continuous, may be defined as the limit in the mean 
of sums of the form 


a g(t,) (x(t) — 2(t-1)). 


22. Stationary processes. When we are concerned with a process representing 
the temporal development of a system governed by laws which are invariant 
under a translation in time, it seems natural to assume that the joint distri- 
bution of any group of variables of the form 


(22.1) x(t, + 7), ee x(t, + T) 


is independent of 7. A process satisfying this condition will be called a sta- 
tionary process. If a stochastic process is defined by means of a “flow” w — w, 
in a space Q (cf. 18), the process will be stationary when and only when the 
corresponding flow is measure-preserving, i.e. if the transformation w — w, 
changes any C-measurable set S into a set S, of the same measure. 

Under appropriate conditions with respect to the measurability of x(é), the 
Birkhoff-Khintchine ergodic theorem holds for a stationary process, i.e. there 
exists a random variable y such that we have 


- 
(22.2) Py (iim 4 [ x(t) dt = v) ~ 1, 
T Jo 
T-2 


where Pp is the probability measure in a suitably restricted space in the sense 
of Doob. Further work seems to be required here, in order to make the situa- 
tion quite clear, also with regard to metric transitivity. 

For a stationary process, any finite moment of the joint distribution of the 
variables (22.1) is obviously independent of 7. Suppose now that we only re- 
quire that this invariance under translations in time should hold for moments 
of the first and second order of the joint distributions, which are assumed to 
be finite. The wider class of processes obtained in this way may be called 
stationary of the second order. Processes of this type have been studied for the 
first time by Khintchine [42]. We shall assume that x(t) is complex-valued. 
Without restricting the generality, we may further assume that Ex(t) = 0 for 
all ¢. The product moment E(x(t)x(u) ) will then be a function of the difference 
t= 2 


(22.3) E(x{t)z(u) = R(t — u). 


Assuming, in addition, that R(t) is continuous at ¢ = 0, it follows that R(é 
is continuous for all ¢, and the process is continuous in the sense of 17. It was 
shown by Khintchine that a NS condition that a given function R(é) should 
be associated with a second order stationary and continuous process by means of 
the relation (22.3) is that we should have 


(22.4) R(t) = [ ” * d(x) 





| 


for 


V 
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for all ¢, where the spectral function F(x) is real, never decreasing and bounded. 
In particular, we have 


F(+«) — F(—«) = ROO) = Ej ad)? =o. 


Khintchine’s condition for R(t) was generalized by Cramér to the case of an 
arbitrary number of processes 2;(f), --- , 2,(é), such that the product moments 
E(x;(t)x;(u)) are functions of the difference ¢ — u. The corresponding spectral 
functions F’;;(x) are in general complex-valued and of bounded variation. Fur- 
ther, the expression (Cramér, [12]) 


, ® i 2; AF; ; ; 

4, j=1 
where AF; = F;,(b) — F;,(a) is, for any a < b, a non-negative Hermite form in 
the variables z;. This result is closely connected with a theorem on Hilbert 
space considered by Kolmogoroff and Julia. It is further shown that, to any 


given functions F;;(x), (¢, 7 = 1, --- , ), satisfying these conditions, we can 
always find n processes 2;(t), --- ,2,(¢) such that the joint distribution of any set 


of variables x;(¢;) is always normal, while the covariance functions R;;(t — u) = 
E(x.(t)x;(u)) are given by the expression 


R;;(t) = | ¢ dF; ;(x). 


For a process x(t) which is continuous and stationary of the second order, 
with Ex(t) = 0 for all ¢, we have the mean ergodic theorem 


T 

(22.5) lim. of } x(t) dt=y 

for any real. ‘The random variable y has the mean 0 and the variance F(X + 0) 
— F(\ — 0), where F is the spectral function appearing in (22.4). If \ isa 
point of continuity for F, it thus follows that y = 0 with a probability equal 
to 1. On the other hand, if \ is a discontinuity, y has a positive variance. Let 
1, Ac, -*: be all the discontinuities of F(x), and let oj, 62, --- be the cor- 
responding saltuses, while y; , y2, --- are the limits in the mean obtained from 
(22.5) for X = Ay, Ae, --*. Then two different y; are always orthogonal: 
E(yjg.) = 0 for 7 + k, and we have 


(22.6) a(t) = Dye + €(d), 


where HE(t) = 0 and 
E | &(t) ? = 0° — Dias. 
If F(x) is a step-function, we have o = 7 o,, and it follows that é(t) = 0 


with a probability equal to 1, so that (22.6) gives a “stochastic Fourier expan- 
sion” of x(t) (Slutsky, [80)). 
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Even when F(z) is arbitrary, we can obtain a spectral representation of x(t) 
generalizing (22.6). In fact, it can be shown (Cramér, [14]) that 2(¢) can always 
be represented by a Fourier-Stieltjes integral 


(22.7) x(t) -[ é dz(u), 


where z(u) is a random function attached to a bounded orthcgonal process 
(ef. 21), such that 


E | z(u + Au) — 2(u) ? = F(u + Au) — F(u). 
Conversely, we have 


coo =6—it(utAu) —itu 
(22.8) e(u + Au) — ou) = —[ © = * a at, 

co 2mit 
so that there is a one-one correspondence between x(t) and Az(u). The integrals 
(22.7) and (22.8) are defined as limits in the mean, as shown above in 17 and 21, 
These results are in close correspondence with generalized harmonic analysis for 
an arbitrary function, as developed by Wiener [83] and Bochner [4]. The spec- 
tral representation of a stochastic process has important applications, some of 
which will be considered in a forthcoming paper by Karhunen [40]. An exten- 
sion of the spectral representation to a more general class of processes has been 
given by Loéve [68]. 

When, in particular, the x(¢) process is such that the joint distribution of any 
group of variables x(t), --- , x(4,) is normal, it follows that any increment 
Az(u) is normally distributed. Since two uncorrelated normally distributed 
variables are always independent, it follows that in this case the z(u) process 
is a differential process with normally distributed increments. Important 
results for this case have recently been given by Doob [22]. 

The properties of continuity, differentiability etc. for processes of the type 
here considered are still incompletely known, and further work is required. 
A further group of important unsolved problems are connected with an inter- 
esting decomposition theorem by Wold [84], which holds for processes with 
a discrete time variable. The generalization of this theorem to the continuous 
case does not seem to have so far been given in a final form. 
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THE ESTIMATION OF DISPERSION FROM DIFFERENCES! 


By AntHony P. Morse? anp Frank E. GruBss 
Ballistic Research Laboratory, Aberdeen Proving Ground, Maryland 


Summary. The estimation of variance by use of successive differences of 
higher order is discussed in this paper. Heretofore, attention has been focused, 
in published works, on estimates of variance obtained by employing the sum of 
squares of deviations from the mean and also by using mean square successive 
differences of the first order [1], [2], [3], [9]. A concise description of the method 
employing differences of any order with appropriate formulae for the precision 
of estimates so obtained and also a practical] example on the use of the technique 
are given in section 11. Fundamental contributions to the estimation of 
variance from higher order differences, a study of the efficiency of the technique 
and proper orientation of the subject matter in the field of mathematical statis- 
tics are given in sections 2-10 of the paper. 

1. Introduction. It frequently happens that successive observations, made 
at regular intervals of time, are subject to the same standard error while the 
means of the populations from which they are drawn display some kind of trend. 
The type of trend we speak of is brought about because of the manner in which 
we have to take measurements or because of variations in the measuring tech- 
nique itself; or, again, the trend may be characteristic of the thing we are meas- 
uring. In any event, we may desire to eliminate the trend in order to study 
residual effects. As an example, it is desirable in the field of ballistics to evaluate 
the dispersion of machine guns firing from a moving airplane. 

It may also happen that it is either inexpedient or impossible to estimate the 
standard error of the observations by the method of least squares, for in a large 
number of cases the type of trend isunknown. In this event a method employing 
differences of an appropriate order may prove valuable. The method consists 
merely of arranging the data in a vertical column in the order in which the obser- 
vations were taken and then forming difference columns in the usual way of 
order 1, 2, up to say 5 or some other number depending on the peculiarities of the 
problem at hand and the number of the original observations. Next, sum the 
squares of the numbers in each column and divide the sum of squares of the pth 


order differences by (n — p) 9) When n > 2 and p > 1, the numbers thus 


° . ° ° ° 2 
arrived at are all unbiased estimates of the population variance o for the case 
where all the observations have the same expected value. In section 11 at the 


1This paper is based substantially on a Ballistic Research Laboratory Report [10] 
of the same subject by Morse and has been prepared for publication by Grubbs at the sug- 
gestion of R. H. Kent. The authors are grateful to J. V. Lewis and H. L. Meyer for their 
many and varied comments, criticisms and suggestions. 

2 Now at the University of California, Berkeley, California. 
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end of the paper will be found a summary of this method, formulas by which 
the precision of the estimate of the variance o° may be determined, and an exam- 
ple displaying the stability of this estimate with respect to p. 

If a strong trend is present then the method of first differences will obviously 
yield an estimate of variance which is fictitiously large and the temptation to 
pass to higher order differences may quite reasonably be yielded to. As a matter 
of fact, unbiased estimates may be hoped for from pth order differences whenever 
there is good reason to suppose that the pth derivative of the trend function is 
small most of the time. However, even in the case of a sinusoidal trend where 
all derivatives have the same magnitude one may obtain good results frcm higher 
differences provided there are at least seven observations in each interval of 
length one period (see section 5 and Table II below). In connection with trends 
such as the sinusoidal type, the hopelessness of getting, say, even a fifth degree 
polynomial to fit over an interval of, say 20 periods is rather evident. It is 
for the above reasons that estimation of variance from higher order differences 
deserves consideration. 

2. Historical comment. A brief historical development of the interest in 
successive differences as a means for estimating dispersion is given in [3]. This 
paper discusses the statistic 


¢ = / . (eu — sill 


i=l 





n 


suggested by “Student” [W. S. Gossett] and E. 8. Pearson and points out the 
relevant work of Jordan, Helmert, Vallier, Cranz, and Becker. It seems that 
Jordan devised methods based on sums of powers of the differences, whereas 
Helmert gave more careful consideration to the case of the first power, i.e. the 
sum of absolute differences. Reference [3] points out, however, that in these 
two cases all the n(n — 1)/2 differences that can be established from a sample of 
n observations were included in the estimates of dispersion recommended by 
Jordan and Helmert, so that the estimate was of no value in reducing the effect 
of a trend. Continuing the remarks of [3], we learn that in ballistics Vallier 
appears to have been the first to estimate dispersion from successive differences 
and that Cranz and Becker commended the mean successive difference 


n—1l 


| | 
Uji — Ze 
Ea = } > | “i+ < | 


i=l 


n—l1 


in estimating dispersion in range of guns since they were aware of variable ex- 
ternal effects (such as tail winds) on a projectile. In this country, Bennett [1] 
appears to have suggested the use of successive differences independently of 
European ballisticians. In this connection, Bennett suggested that the probable 
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of o? 
| | 
5 6 7 8 | 9 | 10 
a 
| | | | 
| 
| | | 
'.20000 | 
.19672 |.16667 | | | | 
|.20633 |.16471 |. 14286 | | 
.21888 |.17274 |.14159 |.12500 | 
.23058 |.18885 |.14830 |.12414 |.1111) 
| | | i 
-24070 |.19476 |.15802 |.12978 |.11050 |-10000 
.24934 |.20450 |.16798 |.13827 |.11529 | .09955 
.25672 |.21300 |.17714 |.14729 |.12271 — 
.26308 |.22039 |.18530 |.15581 |.13086 |.11018 
-26859 |.22684 |.19250 |.16353 |. 13874 |.11754 
.27342 |.23251 |.19887 |.17045 |.14601 |.12481 
27767 |.23752 | .20452 | .17664 |. 15260 13162 
.28145 |.24197 |.20956 |.18218 |.15855 | .13787 
28482 |.24595 |.21407 |.18715 |.16393 |.14356 
.28784 |.24953 |.21813 |.19164 |.16879 |. 14875 
29058 |.252%6 | 22181 |.19571 |.17321 | 15347 
.29306 |.25569 |.22515 |.19941 |.17723 |.15778 
.29532 |.25837 |.22819 |.20279 |.18091 |.16173 
|.29739 |.26082 |.23098 |.20588 |.18428 |. 16535 
| 29929 |.26307 |.23354 |.20873 |.18738 |.16869 
30104 |.26514 |.23590 |.21135 |.19024 |.17177 
|.30266 |.26705 |.23809 |.21378 |.19289 |.17463 
.30416 |.26884 |.24012 |.21603 |.19535 |.17728 
.30555 |.27049 |.24200 |.21812 |.19764 |.17975 
.30686 |.27203 |.24375 |.22007 |.19978 |.18205 
.30807 |.27347 |.24539 |.22190 |.20177 |.18420 
.30921 |.27482 |.24693 |.22361 |.20364 |.18622 
.31027 |.27608 | .24837 |.22521 |.20539 |.18811 
.31128 |.27727 |.24973 |.22672 |.20704 |.18989 
-31222 |.27839 |.25101 |.22814 |.20859 |.19157 
.31312 |.27945 |.25221 |.22949 |.21006 |.19315 
.31396 |.28045 |.25335 |.23075 |.21145 |.19465 
|.31476 |.28140 |.25443 |.23195 |.21276 |.19606 
.31551 |.28229 |.25545 |.23309 |.21401 |.19741 
.31623 | |. 19868 


3 + 
| 
|. 33333 
|-32000 | .25000 
|.33149 |.24427 
| 34453 |.25510 
. 35537 |.26871 
|.36408 |.28071 
|.37113 |.29071 
|.37691 |.29904 
|-38173 | 30602 
.88580 |.31194 
.38928 |.31701 
.39228 |.32139 
|.39490 |.32522 
|-39721 | .32859 
39925 |.33158 
|.40107 |.33424 
1.40271 |.33663 | 
.40419 |.33880 
|.40553 |.34075 
.40675 |.34254 
|.40787 |.34417 
|.40889 |.34567 
.40984 |.34706 
41071 |.34833 
|.41152 |.34951 
1.41228 |.35062 
| 41298 |.35165 
41363 |.35260 
41425 |.35350 
|.41482 | 35434 
.41536 |.35513 
|.41587 |.35588 
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j-41671 |.35724 
|.41724 |.35787 
41765 |.35847 
|-41804 |.35904 


.28314 |. 


25642 


| 23417 |: 


21519 |. 





si 


TABLE 
_- i. ee ” 
\ | 
\Pp 
® 1 2 | 3 4 5 
D mA } | | | | 
NI | | 
— . — | -~ ona =I. on 
42 | .67213 |.50828 |.41875 |.36009 |.31756 
44 | .67188 |.50855 |.41941 |.36104 |.31877 
46 .67164 |.50880° |.42000 |.36191 |.31987 
48 | .67143 |.50903 |.42055 |.36271 |.32088 
50 .67123 |.50925 |.42105 |.36343 |.32180 
| j | 
52 | .67105 |.50944 |.42151 |.36411 |.32266 
54 .67089 |.50962 |.42193 |.36473 |.32345 
56 | 67073 | .50979 | .42233 | .36531 |.32418 
58 .67059 -50995 | 42270 .36585 | .32487 | 
| 
62 .67033 |.51022 |.42337 |.36682 |.32609 
66 | .67010 |.51048 |.42395 |.36767 |.32718 
70 .66990 |.51069 |.42447 |.36843 |.32813 
74. ~+| .66972 |.51089 |.42492 |.36910 |.32898 
78 | .66957 |.51107 |.42534 |.36970 |.32975 
82 | .66942 |.51122 |.42571 |.37024 |.33043 
90 | .66917 |.51150 |.42636 |.37118 | 33162 
98 | .66897 |.51172 |.42689 |.37197 |.33262 
106 .66879 |.51192 |.42735 |.37263 |.33346 
114 66864 |.51208 |.42774 |.37321 |.33418 
122 | .66851 |.51223 |.42808 |.37370 |.33482 
| | 
| } | 
138 66829 |.51247 |.42864 |.37452 |.33585 
154 | .66812 |.51266 |.42909 |.37517 |.33667 
170 .66798 |.51281 |.42944 |.37570 |.33734 
202 | .66777 |.51304 |.43000 |.37649 |.33836 
234 .66762 |.51322 |.43040 |.37708 | .33909 
266 .66751 |.51335 |.43070 |.37752 |.33965 
| | 
330 | .66734 |.51353 |.43112 |.37814 |.34044 
304 | .66723 |.51365 | 43141 37856 |.34097 
522 | .66709 |.51381 |.43178 |.37910 | 34164 
| | 
778 | .66695 |.51396 |.43215 |.37963 |.34233 
1290 | 66684 |.51409 |.43245 |.38007 |.34288 
| | | 
2314 | 66676 |.51418 on |.38036 |.34325 
} | 
| | 
© | .66667 |.51429 |.43290 |.38073 |.34372 | 
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error should be estimated from the root mean square successive differences as 
follows: 


= TE (an 2) x)" 


P.E. = .6745 ; 
V¥ = 2(n — 1) 


In 1940, J. von Neumann and R. H. Kent in [2] investigated further the estima- 
tion of probable error from mean square successive differences (sums of squares 
of first differences). J. von Neumann, R. H. Kent, H. R. Bellinson, and B. I. 
Hart [3] considered the distribution of 


“= ye xi)" 


n—1; 
in a paper which appeared in June 1941. J. D. Williams [4] obtained the 
2 


5 
moments of » = -,, where 
bg 


¢=-> (a — 2) 
N i= 

and indicated that the rth moment of 7 is equal to the rth moment of 4° divided 
by the rth moment of s*. The distribution of the ratio of the mean square 
successive difference to the variance has been published by J. von Neumann 
[5], [6] and B. I. Hart tabulated the probability integral and obtained percentage 
points for this statistic ([7], [8]). Indeed, it should be remarked that the statis- 
tical theory of successive differences is allied with the problem of serial correla- 
tion [9]. Finally, the use of squared differences of higher order than the first for 
estimating variance appears to have been suggested by A. A. Bennett. Quite 
independently, a treatment of the subject was given by Morse [10] in connection 
with problems on exterior ballistics. Various results on successive-difference 
estimation including significance tests have been given by Tintner [13]. One of 
Tintner’s tests involves the use of selected sets of differences. 

3. Definitions and notations. Suppose the observations 2, %2, 13,°:* In 
are made at timesa = 1) <b << --- <t, = band the ¢; are uniformly spaced 
without error. Let f(¢;) be the true trend so that 9; = f(t) is the mean of the 
population from which z; is drawn and e; = x; — n;is arandomerror. Further, 
let p be a non-negative integer less than n and denote to the 7th backward differ- 
ence of order p of x by A’2;, i.e. 


p 
A’, = AP ny — AM = Do (-1)" e+ 


> o- . - 1. 
where ( ) = nin — n)! y! > and t p + 


= 
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We define the following: 


° 1 . » 
(1) én.2 = Tay >> (A"«,)’; 
Bu» 
1 n 
(2) o* oe 2. wal 
(n — p) =? 
p 
2 1 ‘ Pp 2 
(3) ve = Tap Se, (As 
( ) (n — p) 
2 n 
(4) kn = Toy , 2, (A?) (A"e. 
( ) (n _ ) i= ptl 


By E(u) we will mean the expected value of u, whereas the variance of wu will 
be denoted by 


Var (u) = Eju — E(u)}’. 


Basically, we shall assume that the ¢; are sufficiently Gaussian and inde- 
pendent that 


E(«) = E() = 0, Ele) =o’, 
Ma = E(é%) = 30° ’ 
E(e{ej) = E(ef)E(é), 
whenever i, 7, a and £ are positive integers for which 
tJ), 1444 a, 435 


4. Expected values. We will now determine the mean or expected values 
of 5,,p and fans 


‘ ae n Pp _ p 2 
E(6n,») — (??) . 7 ' ay AD ( 1) (? +} ’ 
p P 


E(8x,9) = am - (?) @. 


7] r=0 
Pp 
or 


(5) E(x.) — a. 


(see Lemma 1.3 of section 6 below), 
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Continuing, we have 


i= p+l 


1 ® ‘ 
E(di,y) = () anne BA > (Are + arn’, 
(n — p) 


2 1 2 ‘ Pp 2 
Me (Ya {m-a(F) e+ Fora} 
p p 


(6) Edn») =o + vip. 

Consequently, we observe, d’,,, is on the average larger than o° by the quantity 
v,.»- Ina particular problem, therefore, we are faced with the situation of 
choosing that combination of n and p which (i) regulates the size of v4, and (ii) 
gives the desired precision of our estimate of variance. 

5. The magnitude of »’,,,. In order to study the size of v%,,, , we will derive 
for this quantity an upper bound which will indicate the applicability of the 
method of differences to non-polynomial as well as polynomial trends. 

Now, 
ty h h 
arn = arg) = fo [t= wt + — ae) dupe +++ aa, 


ti-1 
where ¢, — t,.1 = A, by straightforward integration. It will be convenient to 
change the order of integration; thus 


h h ts 
ast) = | | f(y. — ye — +++ — yp) dy dyp --+ dye. 


ti-1 


Since, from Schwarz’s inequality it is clear that 
B 2 6 . 
f g(s) ish < (8 — a) | {g(s)}" ds 
whenever a@ and @ are real numbers and g is integrable, we have 


h h pte 
{A’ ni}? < nw | | / f(y. — ye — +++ — yp}? dy dyp «++ dye. 
ti-1 


Also, 


n ah A tn 
a [a’nd? < | af | (fy — ye — sate — yp)}* dyr dyp +++ dye. 
t= p+1 0 0 tp 
But for0 < r < (p — Ih = t, — awe have 
tn 2 tn—r ; b > 
[Pa oan =f" eras < | Fr as. 
tp-T a 
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Consequently 


n 


D> (an) <2” [ af [ ras dy, «++ dy, = h'?™ [ {f(s)}? ds. 


Since h = 8 , we have finally 
v — 


si oe C - *) — ” {f° (s)}? ds 
(7) “eS n—p)\n—1 oe 7 
p 


which is an upper bound for »%,,, in terms of the average value of the square of 
the pth derivative of the trend function f. 


If the trend function f is of the polynomial form, 
Pp 
SO = 2 a, t" 


then the effect of the trend can be eliminated from our observations by estimating 
dispersion from (p + 1)st differences. However, if it is known that the trend is 
of polynomial form, then an estimate of dispersion based on least squares would, 
of course, be better. In fact, it will be shown later that the precision of 6%,, 
decreases markedly as p increases. The use of d’,,, as an estimate of o° is pri- 
marily of value when the type of trend is unknown; however, even when the type 
of trend is known the computational simplicity of d*,,, may offset to some extent 
its lack of optimum precision. 

Let us reflect on the magnitude of v*,,, over a single period of a sinusoidal trend, 
say f(t) = sint. In (7) we set a = 0, b = 2x and secure 


2p—1 
(7?) on — p) en 
Pp 


Taking n to be the number of observations for a complete period, a tabulation of 
the upper bound for »*,,, for this case is given in Table II. Thus, when there 
are about seven or more observations in each interval of length one period, esti- 
mation of dispersion from higher order differences may prove of considerable 
value even for this rather extreme type of trend. 

6. Some combinatorial relations. Although we will ultimately establish 
expressions for the variances of ,,, and d’,,, , it appears desirable to give first a 
number of combinatorial relations which present themselves in the computation 
of moments. The relations are easily checked and most of them are possibly 
well known. Nevertheless, it will be convenient to record them for reference 
and in some instances to give proofs. In what follows it will be understood that 


(?) = 0 whenever p and gq are not such integers that 0 < q < p. 
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TABLE II 
% } 
\ | 
\ an | 
ah 5 | 6 ' 9 10 
‘ _ a i | 
1 617 | 395 74 | 2010 | 154 110 
2 |  .676 | . 260 120  .063 | ~ .036 .016 
3 751 | 164 | 049 .018 .008 .002 
4 | 106 | |) 021} S005 |_~——(.002 .0003 
5 — | os 009 002 0004 0000 
Lemma 1.1. a(?) = r( 7 > 
q q-1 
Lemma 1.2. (?) = 
r —= — % 
LemMaA 1.3. - \(, : ) = iz .): 
PROOF: 
- s 2p ( 2 ) 8 . 
¥ (7?) 2 = 04a" = ta +o" = {2 (®) 24 
-EE()(.2)# 
5 = = 
Hence 
2p\ _ Pp Pp 
(7)- EG)(2,) 
and 


-t- 


2p \_ p p ‘ Pp 2. p p 
eS. 4-9-5 2) 
212 P\ _(p-1 p—-1 
Lemma 14. [fp +r>0 then (? = ( . ) e a 3 
_ p\ _ p~ tos 
Lemma 1.5. (p 2n)(?) n( 7 ) e 7 ')}. 
2 2 2 
- p\ _ yf) feos 
LemMa 1.6. (p 2»)(?) =p ( ‘ ) E % ry $. 
Proor: Multiply, using 1.4 and 1.5. 


‘ 2p \_  f{/27~-1\'_ ( ®-1\ 
Lemna 1.7. (7 ) =>} —_— ean 


2 Major A. A. Bennett communicated this Lemma. 
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proor: ( — 20(8) = 5{(* 7") — (27> 1) bom 16. 


Put s = 2p, = p — 7, then 


2p Fase 2p —1\" | 2p>-1\ 
a,” ,) = 204(7? — ?) io a 


Lemma 1.8. I[f f is a function, i, n, p are integers and p + 1 <i <n, then 


| %(,? se -9=E()s0. 


-PROOF: 
o(,? sen = 2 (P19 = 5 ()s0. 


Lemma l.9. If —2 < A(r,s) = A(s,r) < © for each integer r and s, then 


| e({> > 3 A(r, s)e, «}) = (us — 30°) : A(r, r)’ 


Il 


r=l s=l 


o* {a A(r, nh + 20" : b> A(r, s)?. 


T=1 s=l 


Proor: Let N(r, s) = 1 whenr < s and let N(r, s) = 0 otherwise. Clearly 


. > A(r, sees = >, A(r, roe + 2 > Z N(r, s)A(r, 8)e-€s , 
r=l rel s=l 


T=1 s= 


E ({> > A(r, s)e, «)) = K ({3: A(r, né)) 


+ 4E ({3 3 a N(r, s)A(r, 8)€r «b), 


r=1 s=l 


and 


Now 


E {> A(r, neh) = (ue — 0) DAG, + ot ‘= A(r, n} 


and 


r=l s=l 


E ({> > N(r, s)A(r, s)e, «)) 


= 4o* > oe, s)A(r, 8)” 


T=!) g=1 


= 20° YD Al, s)* — 2o' > A’, r)? 


ral gel 
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The last three relations combine to yield the desired result. 


LEMMA 1.10. ( 


2p ? 217/64 
, (n — p) E(6,,,p) 


= (us - 30) 1 (;? yh + of 


+ 26° 3. 


S=1 


Proor: Helped by 1.8, check that 


(A?e)? = p> bee 


Therefore 


C PY in - p)d..p -> yo 


Let. 


A(r, s) = (- 


r=1 s=l 


and apply 1.9 to complete the proof. 


LemMaA 1.11. 


EEAE.62 )G2)} 


PRooF. 


21 2G2)G2,)} 


3 


T=1 


n 
i= pt+l j=ptl r=1 s=1 
nm n n Pp 
i=p+l1 j=p+l r=—1 s=0 
n P 


. 2 


t= pt+l je pt+l r=—0 s=0 


(n—p) dD 


n—Dp ( 2p 
mane > 


yr > 


i= ptl 


> &, (; : )t 
en (ea) 


1)’ (?) cr} = 2 (~§) 


=> Dd (-1 et, : 


“(,? Def 


(2 ,)e« 


= 


i= pt+l 


2 2p — 1\? 
) ~ 95 ') +296 


- )(.? hee: 
je Me? a): 


HEC? )G2)62) G2.) 


0)(.48-)GeP) GE) mines 
C.F JO. 


i) using 1.8 again; 


2p — 1\’ 
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%, FOG 5-JEOGG-3) 
7 dy > = (*) (, +3 - } : x, Xe ly +- ): en 
= Fe -9-1r(F,) 


n—p 


- ¥ @-p-iri)(,”,) 


=n—v) & (0%) 2 r(,?,) 
T=p—n ptr r=0 pt+r 
v ( 2 > S(2p -—1\ 2p-1\\ .. 
-m-») & (,”)- 23 {(?- ") - (77-1), using 1.7; 
> 2p 2p—1\° | 27-1 \ 
-o- 9) Z(t.) - 25 ) ~ (pena) 
— ( 2» 2p — ‘ ‘ = ' 
= _ 2 
nee Ehet 7: 2» ( iT « 
Lemma 1.12. 
EE (2-0-9) 
r=1 i=ptl \t — Pp 
PROOF. 


csi? )- ap » ?)- > (?) , from 1.8; 
r=l i=pt+l ee i=pt+l rel a i=p+l r=0 


--»Z(?) = mp) (*). 


7. The variances of 5, , andd’,,,. Inorder to get some idea as to the efficiency 
of the statistics 6”, p and a. p , We will examine their variances. We have 


vd] (n — p)* Var (85,») = (*?) (n — p)” {E(6n,») — [E(On.2)1} 
_ (2p : ee = he" > 2p ; —e, > i 
. G) aa See = (, + ?) 7 ( p ) 


1 (2p — 1\" 
+ 4po* ( is ) 
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with the aid of Lemmas 1.10, 1.11, 1.12 and using the relation p, — 30° = 0, 
Thus, 


(7?) (n —p)? Var (8°...) 


_o 1a St ( DY\_, «f%-1/) «(2p —1\/ 
= 2(n — p)o =i? 4po 9 + 4po " ; 


T=p n ( ) - Pp t 7 Tr 7 2p " 


Moreover, P a ') = 0. 


(8) 


Therefore, 


a , 2 9 44 2 — 1 . 
=p — nV ee al 4 p 4 
(9) e) (n — p) Var (6,,p) = 2(n 1(3?)o so ) o 


when 2p < n. 
: 2 
As for the variance of d’,,,, we have 


Var (d,..) = Eldan — van — of 
= E{8,,p— 0) + kn, oh, 
or 
(10) Var (d'x,p) = Var (8n,9) + E(k»); 


since E[(6,.p — o )kn.p] = 0. 
However, from Schwarz’s inequality, it is guaranteed that 


Ca) < Av", 90» 
Thus 
(11) Var d'y.p < Var (85,») + 404,90". 


An upper bound has already been given for v*,,, in section 5 above. 

8. The efficiency of 5.,,- It is appropriate to consider the efficiency (as 
defined by Fisher [11]) of the statistic 5,,»- In this sense, the efficiency of 
5.,.p is given by 

Var sx 


=\2 
W(n, Sa ue in where 8, = Le (a — 2)° z 
(, p. Var &..p , n—1 
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Accordingly, 
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W(n, P) = G— 4) Var (6.5) 


W(n, p) 
in — py (7?) 
iD 2p 2 2p —1 r 2p — 1\"\" 
m—D4in—p) SS (7?) - 2»( » ) +20 ( n YS 
If2p <n 
_ v2 (2p 
ons ) 


ee 
inv {wn (%)—20(”5 ')} 


(14) W(n, p) = ———— a 


(13) W(n. p) = 


if 2p <n. 


hi, . 2-2 ey 
bia 2 


Formulas (12) and (13) were used in preparing Table I given at the end of the 
paper. For convenience in using formulas (1) and (2) the binomial coefficients 


eg for 0 < p < 10 are given in Table III. 


If n > 2, then 


2(n — 1) 
W(n, 1) = at aoe 
a 


3n — 3 


(15) 


as was pointed out by von Neumann, Kent, Bellinson, and Hart in [3]. 
If n > 4, then 


7 1 - 18(n — 2)? 
(ie) WO 2) = I i8_\ ~ @— 1)(@5n — 88)" 
1+ _~——— 
n—2 35(n — 2) 
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As a limiting value for n, we have 


4] 
(17) W(o, p) = Lim W(n, p) = +~. 
n> oO 4p 
= 
Using Stirling’s formula for the approximation to the factorial, we have 


Lim V/pW(«,'p) = 4/2. 


pq 


Thus, asp — ©, W(, p) tends to zero and is asymptotically equal to me 
7p 


TABLE III 
: ; ; : 2p 
The Binomial Coefficient » 








p | (7. 
0 1 
1 2 
2 | 6 
3 | 20 
4 70 
5 252 
6 924 
7 3432 
8 12870 
9 48620 
10 184756 
2 z (zi — x)” 2 
For the case n > 2, p > 1 and f constant, then s,, = ———__— and 6,,, 


n-— | 
and d,,,, are all unbiased estimates of the population variance o. Moreover, 
for this case 


Var (8, ) Var (s',) 


W(n, p) = - —, 
, Var (5,,p) Var (d’,,p) 


Using s,, based on m — 1 degrees of freedom and keeping the trend, f, con- 
stant, then m and n may be chosen so that approximately 
Var (s;,) = Var (d;,,p) 
and for a normal population this means that 


m= 1+ (n — 1)W(n, p). 
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Using Table I, it may be seen that for constant trend, f, the worth of déo,10 as 
an estimate of o for anormal population is about the same as that of Si , Whereas 
that of dio, 1 is about equivalent to sj). However, if the trend fis not constant 
then the worth of s%, as an estimate of o° is diminished while that of d’..» is 
increased. ; 

Similarly, if the trend is cubic over 20 observations then least squares gives 
an unbiased estimate of o° based on 16 degrees of freedom, whereas a gives an 
estimate equivalent in precision to about 6.4 degrees of freedom. However, if 
only eight observations follow a cubic trend, then least squares furnish an un- 
biased estimate of o” based on four degrees of freedom whereas ds, furnishes an 
estimate equivalent to about 1.9 degrees of freedom. Thus, in the case of 20 
observations, cubic least squares is, so to speak, 2.5 times as valuable as doo; 
in the case of eight observations, cubic least squares is 2.1 times as valuable 
as &. i 

It might be mentioned that the method of differences is of value in estimating 
goodness of fit. If the fit is good, then our estimate of o derived from least 
squares should on the average be equal to the estimate derived from a suitable 
d.,. If the fit is poor then . will be smaller on the average than the former. 

9. The approximate probable error in estimating o from differences. The 


approximate standard error of 6,,, is given by the relation 


7 2 
S.E. (6n,p) ~ 1S.E. @n) _ __ ¢ 


2 o ~ /2(n — 1)W(n, p)* 
If p has been so chosen that v,,, is suitably small then [see equation (11)] 
some confidence may be put in the approximate formulas: 


Co 
(18) S.E (dap) = Vin — 1)Wen, p) 
67450 
~/2(n — 1)Win, p)’ 
Formula (19) was used in preparing Table IV which gives the approximate 
probable error to be feared in using d,,,p as an estimate of ¢. This table should 


yield interesting information whenever p has been chosen so that d;,,, is a suitably 
° es 2 
unbiased estimate of o. 


(19) P.E. (dn.p) = 


10. Remarks. We have presented a useful technique for estimating variance 
from higher order differences and have given the precision of our estimate. The 
method of estimating variance from higher order différences appears to be quite 
valuable in cases where the type of trend in our observations is unknown. A 
considerable field of work remains concerning a complete investigation of the 
distribution and other properties of the statistic d*,,,. In this connection, 
Baer [12] has already published a study on the stochastic limit of ——— i r 
It is hoped that others will contribute to the problem of estimating dispersion 





ANTHONY P. MORSE AND FRANK E. GRUBBS 


TABLE IV 
The Probable Error In Estimating o From Differences* 


"WA 


e 


.4769 | 
.4054 | .4769 
.3495 | .4215 | .4769 
.3104 | .3704 | .4404 | .4769 
.2817 | .3318 | .3855 | .4390 | .4769 
.2596 | .3024 | .3477 | .3969 | .4442 | .4769 

.2420 | .2794 | .3183 | .3604 | .4057 | .4481 | .4769 | 

.2274 | .2610 | .2948 | .3311 | .3708 | .4128 | .4513 | .4769] 











coe mnoarh WN Fe 


2153 | .2457 | .2758 | .3074 | BAIT | .3794 | .4186 | .4537] . 
.2048 | .2328 | .2599 | .2880 | .3180 | .3508 | .3867 | .4234 
1958 | .2217 | .2465 | .2717 | .2983 | .3272 | .3587 | -3930) 
.1878 | .2120 | .2350 | .2579 | .2818 | .3073 | .3351 | .3656| . 
.1808 | .2035 | .2248 | .2459 | .2677 | .2905 | .3152 | .3423] . 
1744 | .1960 | .2159 | .2355 | .2554 | .2761 | .2983 | 3293) 
| .1687 | .1892 | .2080 | .2262 | .2447 | .2637 | .2837 3052) .3: 
.1636 | .1831 | .2009 | .2180 | .2352 | .2527 | .2710 | .2905) . 
1589 | .1775 | .1945 | .2106 | .2267 | .2430 | .2599 | .2777) . 
1545 | .1724 | .1886 | .2040 | .2191 | .2343 | .2500 | -2663] . 








.1505 | .1677 | .1832 | .1978 | .2121 | .2264 | .2411 | 2562] . 
.1468 | .1634 | .1783 | .1922 | .2058 | .2193 | .2331 | .2472) «4 
.1433 | .1594 | .1738 | .1871 | .2000 | .2129 , .2258 | .2391| . 
1401 | .1557 | .1695 | .1824 | .1948 | .2069 | .2191 | .2316) . 
.1371 | .1522 | .1656 | .1779 | .1898 | .2015 | .2131 | .2249 
.1343 | .1490 | .1619 | .1739 | .1853 | .1964 | .2075 | .2187| 
.1316 | .1459 | .1585 | .1700 | .1810 | .1917 | .2023 | .2130! . 
1291 | .1431 | .1553 | .1664 | .1770 | .1873 | .1975 | .2077] . 
.1268 | .1404 | .1522 | .1631 , .1733 | .1832 | .1930 | .2028) . 
.1245 | .1378 | .1493 | .1599 | .1698 | .1794  .1888 , .1981| . 








.1224 | .1354 | 1466 | - 1569 | 1665 | .1758 | .1848 | .1938) . 
.1204 | .1331 | .1441 | .1540 | .1634 | .1724 | .1811 | .1898 
.1184 | .1309 | .1416 | .1514 | .1605 | .1692 | .1776 | .1860) . 


| 
| 
| .1166 | .1288 | .1393 | .1488 | .1577 | .1661 | .1744 a 
| 
| 
| 








-1149 | .1268 | .1371 | .1464 | .1550 | .1632 | .1713 | .1791 
1132 | .1249 | .13850 | .1441 | .1525 | .1605 | .1683 | .1759 
.1116 | .1231 | .1330 | .1418 | .1501 | .1579 | .1655 | .1729 
-1101 | .1214 | .1311 | .1397 | .1478 | .1555 | .1628 | .1700 




















.1072 | .1181 | .1274 | .1358 | 1435 .1508 | .1578 | .1646 


.1086 | .1197 | .1292 | .1377 | .1456 | .1531 | .1603 | 1640 
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TABLE 1V—Continued 


3 | 4 | 5 | 6 8 


| .0736 | .0909 | .1045 | .1151 | .1241 


| .1322 | .1396 | . .1533 | .1597 
| 0719 | .0887 | .1020 | .1123 | .1211 | .1288 | .1360 | . .1491 | .1553 
| -0703 | .0868 | .0997 | .1097 | .1182 | .1257 | .1326 | . .1453 | .1512 
| 0689 | .0849 | .0975 | .1073 | .1155 | .1228 | .1295 | . .1417 | .1474 
| .0675 | .0832 | .0955 | .1050 | .1130 | .1201 | .1266 | . .1383 | .1438 
| .0661 | .0815 | .0936 | .1029 | .1107 | .1176 | .1238 | . .1352 | .1405 
| 0649 | .0800 | .0918 | .1009 | .1085 | .1152 | .1213 | . | .1323 | .1375 
| .0637 | .0785 | .0901 | .0990 | .1064 | .1129 | .1189 | . .1296 | .1346 

-0626 | .0771 | .0885 | .0972 | .1045 | .1108 | .1166 | . .1271 | .1319 





| 0606 | .0746 | .0855 | .0939 | .1008 | .1069 | .1125 | . .1224 | .1270 
| .0587 | .0723 | .0828 | .0909 | .0975 | .1034 | .1087 | . .1182 | .1225 
| .0570 | .0702 | .0804 | .0881 | .0946 | .1002 | .1053 | . .1144 | .1185 
.0554 | .0682 | .0781 | .0856 | .0919 | .0973 | .1022 | . .1109 | .1149 
.0540 | .0664 | .0760 | .0833 | .0804 | .0947 | .0994 | . 1077 | .1115 
.0527 | .0648 | .0741 | .0812 | .0871 | .0922 | .0968 | . .1048 | .1085 











| 0503 | .0618 0707 | .0774 | .0830 | .0878 | .0921 | .Os .0997 | .1031 
| .0482 | .0592 | .0677 | .0741 | .0794 | .0840 | .o88so0 | . .0952 | .0984 
.0463 | .0569 | .0650 | .0712 | .0762 | .0806 | .0845 | . .0913 | .0943 
| .0447 | .0549 | .0627 | .0686 | .0734 | .0776 | .0813 | . .0878 | .0907 
.0432 | .0530 | .0606 | .0663 | .0709 | .0749 | .0785 | . .0847 | .0875} .0900 





| .0406 | .0498 | .0569 | .0622 | .0666 | .0703 | .0736 | . .0794 | .0819} .0843 
| 0384 | .0472 | .0538 | .0589 | .0630 | .0664 | .0695 | . .0749 | .0773| .0795 
| 0366 | .0449 | .0512 | .0560 | .0599 | .0632 | .0661 | .0687 | .0711 | .0734| .0755 





Pe ee ae | 

| .0336 | -0412 | .0470 | .0513 | 0548 | .0578 | .0605 | .0629 | .0650 | .0671| .0689 

| .0312 | .0382 | .0436 | .0476 | .0509 | .0537 | .0561 | .0583 | .0603 | .0621| .0639 
.0292 | .0359 | .0409 | .0446 | .0477 | .0503 | .0525 | .0546 | .0565 | .0582| .0598 


| .0262 | .0322 | .0367 | .0400 | .0428 | .0451 | .0471 | .0489 | .0505 | .0521/ .0535 
| 0240 | . .0336 | .0366 | .0391 | .0412 | .0430 | .0447 | .0462 | .0475] .0488 





522 .0209 | . .0292 | -0318 | .0339 | .0357 | .0373 | .0387 | .0400 | .0412) .0423 


778 | .0171 |. | 0239 | .0260 | .0278 | .0292 | .0305 | .0317 | .0327 | .0337/ .0346 





1290 | .0133 | . .0185 | .0202 | .0216 | .0227 | .0237 | .0246 | .0254 | .0261| .0268 
| | | | 
2314 | .0099 | . .0138 | .0151 | .0161 | .0169 | .0177 | .0183 | .0189 | .0195| .0200 

















* If dn%, is a sufficiently unbiased estimate of o?, then the approximate probable error 


to be feared in using d,,p as an estimate of « may be obtained by multiplying the following 
tabular entries by co. 
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when observed data display trends as it is believed that the method of differences 
deserves much attention. In particular, it is hoped that someone will have the 
time and ingenuity to calculate the distribution of the statistic 

2 4 

n,p 


2 Dtl 

Were this done, an admirable criterion would be at hand for gauging the signifi- 
cance of a change in the estimate of o” as we pass from differences of order p to 
those of order p + 1. Of course, useful information in this connection could be 
obtained from a knowledge of the distributions of 6%,,, and 8°,,,41 ; in fact their 
variances as herein calculated give us a basis for somewhat reasonable conclu- 
sions. An expression for the standard error of the difference between the 
estimates of o° from two consecutive series of finite differences is given in 
[13, Chapter VI]. 

In connection with testing goodness of fit, it would be valuable also to know 
the distribution of 


2 
_ 


—-, 
6 n,pt+l 


where S*,,, is the estimate of variance derived from the least squares fitting of a 
polynomial of degree p. 

For convenience of reference, we conclude the paper with 

11. A concise description of the method and its precision. It frequently 
happens that successive observations made at regular intervals are subject to 
the same standard error o while the means of the populations from which they 
are drawn display a trend. We give here a method of estimating the variance o° 
and of determining the precision of our estimate. This method is primarily of 
value when the trend is unknown; however even when the type of trend is known, 
its computational simplicity may make the method advantageous. 


The method. Arrange the data in a vertical column and then in the usual 
way form difference columns of order 1, 2,--- , p. Sum the squares of the pth 


order differences and divide. by the number (n — p) f=} Our estimate of o 


: 2 
is the number d‘,,, , where 


n 
> ; (xg — QWeirr + Fixe)? 
, : ; . 4. tae 
4 Dixon [9] gives moments of the statistic ——-—--—————_ where fay: = 4% 
> ) (tg — Vi4r)? 
i=1 
and Iny2 = Ze. 








ESTIMATION OF DISPERSION 213 


The precision. The precision of this estimate may be determined from the 


following information (which has been derived in the present paper) : 


E(d’,.») = o + Pais 


. m ft (° ca (: = 1 b f(s)? ds 
a n—p/\n-1 o b—-a ’ 
Pp 


Var (d*,.p) < Var (8,,p) + 40,90" ; 


V (5. ) pa sisal om 
ar eur) = (n — 1)W(n, p)’ 
where W(n, p) is given in Table I. 


TABLE V 











Pp | ¢, | vy | oz 

1 18.90 | 184.62 | 11.22 
2 1.21 | 1.88 | 10.56 
3 88 | 1.85 | 10.30 
4 87 | 1.84 10.12 
5 | 86 | 1.83 | 10.01 





In case Toa is sufficiently small (this is determined by the requirements of the 
given problem), then Table IV may be used directly to determine the approxi- 
mate probable error in using d,,,, as an estimate of c. 


An example. Asa practical example of the use of the method of differences 
when the trend is unknown and of the stability of the statistic d%,,, with respect 
to p, we mention a recent problem at Aberdeen Proving Ground which had to do 
with estimating the accuracy with which certain photographic measurements 
locate a moving object. Ballistic Cameras were used to determine horizontal 
x and y, and vertical z coordinates (all in feet) of an airplane traveling about 
160 mph at an elevation of about 35,000 feet. An automatic pilot was in use in 
the airplane as it flew over a three mile course. At one second intervals for a 
period of 70 seconds two Ballistic Cameras, 5000 feet apart, were used to locate 
the plane. Since the plane was traveling pretty much in the y direction one 
would expect: that first differences would yield a standard error in y far in excess 
of its true one; that second differences would furnish a much better estimate; 
and that perhaps third differences would yield a still more trustworthy one. No 
matter what order of difference is used we never expect such an estimate to be 
too small. In this problem, the standard errors in z, y, z as estimated from dif- 
ferences of certain orders, p, were as given in Table V. 
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THE EFFICIENCY OF SEQUENTIAL ESTIMATES AND WALD’S 
EQUATION FOR SEQUENTIAL PROCESSES 


By J. Wo.LFrow1tTz 


Columbia University 


1. Summary. Let n successive independent observations be made on the 
same chance variable whose distribution function f(x, 6) depends on a single 
parameter 6. The number n is a chance variable which depends upon the out- 
comes of successive observations; it is precisely defined in the text below. Let 
6*(2, ,--* , tn) be an estimate of @ whose bias is b(@). Subject to certain regu- 
larity conditions stated below, it is proved that 


2(9*) > db \* ; d log f\* |" 
or) > (1+ 2) | moe ( mi) | ; 


When f(x, 6) is the binomial distribution and 6* is unbiased the lower bound 
given here specializes to one first announced by Girshick [3], obtained under no 
doubt different conditions of regularity. When the chance variable n is a con- 
stant the lower bound given above is the same as that obtained in [2], page 480, 
under different conditions of regularity.’ 

Let the parameter @ consist of 1 components 6, , --- , 6; for which there are 
given the respective unbiased estimates 6) (x1, --- , tn), °°* » O07 (t15°°* 5 Un) 
Let || x; || be the non-singular covariance matrix of the latter, and || X* || its 
inverse. The concentration ellipsoid in the space of (k;, --- , kx) is defined as 


rib — O)(ky — 0;) = 1+ 2. 


(This valuable concept is due to Cramér). If a unit mass be uniformly dis- 
tributed over the concentration ellipsoid, the matrix of its products of inertia 
will coincide with the covariance matrix || \;; ||. In [4] Cramér proves that no 
matter what the unbiased estimates 6; , --- , 67 , (provided that certain regu- 
larity conditions are fulfilled), when n is constant their concentration ellipsoid 
always contains within itself the ellipsoid 


De mills — 6)(k; — 6) =1+2 


_— 0 log fa si) 
oo ( a0; 00; )° 


To whom this result is to be ascribed is not clear from the context in which Professor 
Cramér deseribes it (in [2]). After the present paper was completed the author learned of 
the papers by Rao [8] and Aitken and Silverstone [9], both of which deal with this question. 
The author is indebted to Prof. M. S. Bartlett for drawing his attention to these papers. 
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Consider now the sequential procedure of this paper. Let O°: « be, 
as before, unbiased estimates of 6, --- , 6, respectively, recalling, however, 
that the number n of observations is a chance variable. It is proved that the 
concentration ellipsoid of a always contains within itself the ellipsoid 


Do mish ae 6;) (k;—0;) = l + 2 
where 


po . oie fee!) 
oe (7! 30; 


When n is a constant this becomes Cramér’s result (under different conditions 
of regularity). 

In section 7 is presented a number of results related to the equation 
EZ, = EnEX, which is due to Wald [6] and is fundamental for sequential 
analysis. 


2. Introduction. Let X be a chance variable whose distribution function 
f(x, 0) depends on the parameter 6. It is assumed that X either has a probability 
density function (which we then denote by f(z, 6)) or that it can take only 
an at most denumerable number of discrete values (in the latter case 
f(x, 0) = P{X = x}, where the latter symbol denotes the probability of the 
relation in braces). Let w = 2, 2, --- be an infinite sequence of observations 
on X, and let Q be the space of ‘‘points” w. Let there be given an infinite 
sequence of Borel measurable functions g1(21), go(a@1 , %2), «°° 5 gM, °°° Xs), 0° 
defined for all w in 2, such that each takes only the values zero and one. It is 
well known that the function f(z, 6) defines a measure (probability) on a Borel 
fieldinQ®. Weassume that everywhere in Q, except possibly on a set whose prob- 
ability is zero for all @ under consideration, at least one of the functions ¢ , g , - +: 
takes the value one. Let n(w) be the smallest integer at which this occurs. 
Thus n(w) is a chance variable. 

In statistical applications the chance variable n(w) may be interpreted as a 
rule for terminating a sequence of observations on the chance variable X, the 
probability of termination being one, and the decision to terminate depending 
only upon the observations obtained. A sequential test is an example of this 
procedure. The converse is, however, not true, because the process described 
above does not require that any statistical decision should be reached when the 
process of drawing observations is terminated. 

An “estimate” of 6 is a function 6*(x , --- , x,) of the observations 7 , --- , Xn 
(those obtained prior to the ‘‘termination”’ of the process of drawing observa- 
tions). In the sequel we shall limit ourselves to estimates whose second moments 
are finite. The estimate is “unbiased” if H6*, the expected value of 6*, is @. 
When this is not so H6* — @ is called the bias, b(@), of 6*. In general the bias 
is a function of 6. It is obvious that the function 6* may be undefined on a set 
of points (2 , --- , x.) whose probability is zero for all @ under consideration. 


Ln 2a | 


A 
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In the present paper we shall be concerned with an upper bound on the effi- 
ciency of a sequential estimate, or, more precisely, with a lower bound on its 
variance. This lower bound is intimately related to certain results on the effi- 
ciency of the maximum likelihood estimate from a sample of fixed size. This is 
not surprising since fixed-size sampling is a special instance of sequential sam- 
pling. The results obtained in this paper are also obviously and intimately 
related to those due to Cramér [4] and those described by him in [2], pp. 477-488. 
Naturally the conditions of regularity (restrictions on f(x, 6), 6*, ete.) under 
which the results are proved are different. For example, no restrictions on the 
sequential sampling procedure need appear in the statement of a theorem which 
deals only with samples of fixed size. 

The argument below proceeds as if f(a, 6) were a probability density function. 
The results apply equally well to the case where f(z, @) is the probability function 
of a discrete chance variable provided: 

1). Integration is replaced by summation wherever this is obviously required. 

2). The phrase ‘‘almost all points” in a Euclidean space of any finite dimen- 
sionality is understood 

a). as all points in the space with the possible exception of a set of Lebesgue 
measure zero, when f(x, @) is a probability density function 

b). as all points in the space with the possible exception of points one of whose 
coordinates is a member of the set Z, when f(x, @) is the probability function of a 
discrete chance variable. The set Z consists of all points z such that f(z, 6) = 0 
identically for all 6 under consideration. 


3. Conditions of regularity. In this section we shall formulate the restrictions 
which we impose on f, the estimates, and the sequential process. They are 
intended to be such as will be satisfied in most cases of statistical interest. No 
doubt they can be weakened, but the author has decided against attempting to 
do so here. The list may seem long for two reasons. Seldom in the literature 
are the assumptions which, for example, lead to validation of differentiation 
under the integral sign etc., formulated explicitly. The presence of a sequential 
procedure means that additional restrictions must be imposed. 

In this section we assume that @ is a single parameter. The case where 6 has 
more than one component is treated later. 

(3.1). The parameter 6 lies in an open interval D of the real line. D may consist 
of the entire line or of an entire half-line. 


(3.2). The derivative - exists for all @ in D and almost all x. We define 


0] ) ’ 
ae ® as zero whenever f(x, 0) = 0; thus — is defined for all 6 in D and 


almost all x. We postulate that E : ae = 0 and that E (2 fe ay 








be not zero for all 6 in D. 
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(3.3). E (> 


t==l 





0 log f (x; , )) | ) 
06 
exists for all 6 in D. 
(3.4). Let R;, (j = 1, 2, --- ), be the set of points (x1 , --- , x5) in the j-dimen- 
sional Euclidean space such that 


gi(ti,-*+,%:) =0 @=1,2,---,j7-1 

eAM1,°°*, 23) = 1. 
For any integral j there exists a non-negative L-measurable function T ;(x , --+ , x;) 
such that 

a ‘ 
a). | O(a, +++ 23) STD Pes, 6) | < Tiler, +++ 52) 
for all 6in D and almost all (11 , +--+ ,x;) in R; 
b). i Tj(t1, wt , 23) dx; eee dz; 
Rj 
ws finite. 
(3.5). Let 
i 
40) = | ore, -+,0) LIfe, 0) de, j= 1,2,-+), 


We postulate the uniform convergence of the series 
_ dt;(0) 
Fe? |) 


(0) 


(the existence of ad is a consequence of Assumption (3.4)) for all 6 in D. 


4. The case of one parameter. In this section we assume that f(z, 6) depends 
on a single parameter 6. In sections 5 and 6 we shall discuss the case when 6 
is a vector with more than one component. 


We have F d log f(z, 4) _ 0 
00 
by (3.2). Define the chance variable 


_ 9 log f(x, 9) 
oe 2, 30 


By an argument almost identical with that of [1], Theorem 1, or of Theorem 7.1 
below, we have 


(4.1) EY, = 0. 
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From Theorem 7.2 below we obtain 


2 
(4.2) o(Y,) = EnE (2ion Se. *)) ‘ 
Let 6* (21, -°-* , Xn) be an estimate of 6 such that 
Eoe* = 6+ b(6@). 
Then 
2 i 
(4.3) dX [ O* (a1, +++, 25) IT fa, 6) dx; = 6 + b(6). 
I= 7 = 


Differentiation of both members of (4.3) with respect to 6 (Assumptions (3.4) 
and (3.5)) gives 


db 
Ee* Y, = —. 
(4.4) Y 1 + 0 


From (4.1) it follows that (4.4) gives the covariance between 6* and Y,,. Hence 
from (4.2) 


2a db\* |, 1, (d log f(a, 0)\"}* 
(4.5) a (6*) > ¢ + ) Ez (2tou fe 0) | 


When the bias 6(@) is constant, for example when 6(6) = 0 in case 6* is an 
unbiased estimate, we have from (4.5) 


(4.6) a (6*) > Ez (ese) 7 


The equality sign in (4.6) will hold if 6* may be written as Z’(6)Y, + Z’’(8), 
where Z’ and Z” are functions of 6. However, 6* itself should not be a function 
of 6 if our argument is to remain valid. The subject is connected with the 
question of the existence of a sufficient estimate. 

Let f(x, 6) be defined as follows: 








f(z, 0) = (1 — 0)", (x = Oor1;0 < 6 < 1). 
Then 
0 log f(x, 6) _7t_ (1 — 2) B(? wd) a. 1 
30 6 (1—6)’ “\" 060 a(1 — 6)° 


Suppose 6* is unbiased. Then o°(6*) > 6(1 — 6)(En)™, a result first given by 
Girshick [3] under unspecified regularity conditions. 

Let the functions ¢; , g2, --- be such that n(w) is a constant. We are then 
dealing with samples of fixed size. The result (4.5) is then given in [2], p. 480, 
under different conditions of regularity. 


5. Regularity conditions for the case when @ has more than one component. 
We suppose that 6 = (6,---, 6) and that simultaneous estimates 
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OF (21 ,*** Bala °*, 6; (21 ,--- , tn) of the components of 6 are under discussion, 
In the sequel we shall limit ourselves to the case when these estimates are all 
unbiased. 

We postulate the following regularity conditions which are sufficient to validate 
section 6: 

(5.1). The covariance matrix of the estimates 6: 5s, OF 8 non-singular for all 
6 in D (this time D is an open interval of the l-dimensional parameter space). 

(5.2). The conditions of section 3 are satisfied for each 0; and 6; (i = 1,---, i). 


6. The ellipsoid of concentration when @ has more than one component. Let 
6 = (0; , s* , 91). 


We shall first describe briefly the result of Cramér [4] which refers to samples 
of fixed size n > l. Let 63 (24 ,°**, nn) be an unbiased estimate of 
6;, (¢ = 1,---, 2). Let || Ai; || be the non-singular covariance matrix of the 
6; , and let || \ || be its inverse. The “ellipsoid of concentration” in the space 
of points (k, , --- , kz) is defined as 


l 

(6.1) do A (ks — (kj — 0;) = 1 + 2. 
i,j=1 

If a unit mass be distributed uniformly over this ellipsoid it will have the point 

(6:,--- , 6) as its center of gravity and ),; as its product of inertia about the 

corresponding axes. Cramér proves that, subject to certain regularity condi- 

tions, there is a fixed ellipsoid 


(6.2) milk; — 0)(k; — 0;) =1+2 


l 
i,j7=1 
where 

—_* (? log f d ee) 
aij = nE | ——-- ——-= 
00; 00; 

which is always contained entirely within the concentration ellipsoid of any set 
of unbiased estimates. The two ellipsoids coincide only under certain condi- 
tions, among which is that the 6; be jointly sufficient estimates of the 6; . 

Let us now consider the sequential procedure of this paper and postulate the 
regularity conditions of section 5. Let 








K = || kis || 
be a matrix with real elements such that | K | = 1 and let 
K* = || K|| 
be its inverse. Let 
|| | \*| va 
| {| 
1° || | | ‘ | , 
me ey =| » wll =] - 
| 6; lot v1 
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be column matrices. Suppose 











(6.3) ivi] = K || @|l. 
Then 
(6.4) || = K* |v I]. 
Define ssid 

| Vi | 

lv* || = | - || = K || 6* |]. 

vi 

From section 4 we have 
+ ’ 0 ] Ly 6 , 2 ~~ 
(6.5) EnE (2a fe. 0) > [eP(w yt 
Oy 
; — ., dlogf. : : ‘ 
where the differentiation by which Oh is obtained is performed with y2, --- ,y 
1 

held constant. Consider the last (J — 1) rows of K as fixed and (ky , kin, «+ , ky) 
as free to vary subject only to the restriction that | K | = 1. The left member 


of (6.5) is then a fixed quantity, while the right member is a function of the first 
row of K. The inequality (6.5) must remain valid for all admissible 
(ku ,°*° , ku). Hence (6.5) will remain valid if the right member of (6.5) is 
replaced by its maximum with respect to (ku , --- , ki). We shall obtain this 
maximum and find that (6.5) then implies a result about the minimal ellipsoid 
of concentration. 

The problem is therefore to minimize o°(yr). Now 


(6.6) o(yi) = Z Viz ri kaj - 
2 

The family of ellipsoids in the space of (ky, , - ++ , iz) 

(6.7) zs Nizkiki; = ¢, 


where c is a running parameter, has all centers located at the origin. Let 
(kin, ++, kat) 
be the sought-for maximizing values of (ky, --- , ki). From the definitions of 
K and K* we have 
(6.8) Dk ky = 1 
where (k", k",---, k“) are constants. It follows that the minimum value 
¢ of o°(;) is such that the ellipsoid 
(6.9) z Aikiki; = Co 
t2 


is tangent to the hyperplane (6.8) at the point (ki,--- , ki). Now the tan- 
gent plane to (6.9) at this point is given by 


(6.10) Do Aig kicks = C0. 
2 
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From (6.8) and (6.10) we obtain 


(6.11) ek = Xu kisi 5 (j = 1, +++, 0), 
Hence 
(6.12) @ DAV KY = Kis, (j = 1,---,0) 
from which 
(6.13) Co i NIK = 1. 

We have 


d log f _ wt fo ed 











OY z 
(6.14) 
(2xet) y eihy d log f a log f 
ai - 00; 00; 
From (6.5), (6.13), (6.14), and the definition of cy we conclude that 
(6.15) D Bak kt > 2d Nl “a 
t,) 
where 
; . a tess ie!) 
). = EnE ; 
(6.16) - ( 0; 80; 
We may restate (6.15) as follows: The concentration ellipsoid 
(6.17) > Aki — 0;)(k; — 0;)) =1 +2 
tJ 
of the unbiased estimates 6; eek, 0; always contains within itself the ellipsoid 
(6.18) Dy milks — 0)(kj — 0) = 1+ 2 
t2 


where the y;; are defined by (6.16). 

The question of the coincidence of the two ellipsoids is connected with the 
question of the existence of sufficient estimates. It may be difficult to state 
any general results about the concentration ellipsoid of biased estimates without 
postulating some relationships among the biases and/or their derivatives. 


7. On Wald’s equation and related results in sequential analysis. In sec- 
tion 4 we referred to a proof by Blackwell [1] of an equation due to Wald [5] 
which is fundamental in the Wald theory of sequential tests of statistical hypothe- 
ses. Here we shall give a perhaps simpler proof of this equation, and then prove 
several new and related results of general interest for sequential analysis. 

The results of Theorems 7.2 and 7.3 below can be obtained by differentiation 
of Wald’s fundamental identity of sequential analysis ((6], [7]). However, the 


cor 
fou 


val 
we 


cul 
mé 
ge 
sui 
tia 
ce: 


g | 
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conditions under which we obtain these results are less stringent than any so far 
found sufficient to establish the identity and the validity of differentiating it. 
Theorem 7.4 and its corollaries refer to sequential processes where the chance 
variables may have different distributions or even be dependent. In the future 
we hope to return to the question of finding all central moments of Z,, , the 
problem of generalizing the fundamental identity, and related questions. 

For Theorems 7.1, 7.2, and 7.3 we shall assume a chance variable X whose 
cumulative distribution function F(z) is subject only to whatever restrictions 
may be explicitly imposed on it in each theorem. We assume the existence of a 
general sequential process such as is described above, which is subject only to 
such restrictions as may be explicitly formulated in each theorem. The sequen- 
tial process of course defines the chance variable n. Let 21, 22,--- be suc- 


n 


cessive independent observations on X. We define Z, = > z:. If E(X) and 
i=1 


o°(X) exist we shall denote them by w and o’ , respectively. 
THEOREM 7.1 (Wald [5], Blackwell [1]). Suppose wand Enexist. Then 


(7.1) E(Z, — nw) = 0. 


The following theorem, which is a sort of partial converse of Theorem 7.1, is 
proved concomitantly with Theorem 7.1: 

THEOREM 7.1.1. Jf EZ, exists, and if either P{X > 0} = OorP{X <0} =0, 
then w and En both exist, and 


EZ, = wEn. 


Actually the same proof suffices for a somewhat stronger form of Theo- 
rem 7.1.1: 
THEOREM 7.1.2. If EZ,, exists, and if 


E(X;|n = j) 20 (or < 0) 


for all positive integral j such that P {n = j} ¥ 0, and alli < j, then w and En 
both exist, and 


EZ, = wEn. 
n 3 
THEOREM 7.2. Jf E ( |\2;— w| ) exists, then o° and En both exist, and 
t=1 


(7.2) E(Z, — nw) = o En. 
We have 


E(Z, — nw) = E (o (x; - x) 


=-Ld/ @-w Tare). ’ 


jel inj IR; 


of (& @ - w) Hare 


(7.3) 
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Also 


(7.4) . / w=“ I] iin) « Pla > Ae, ~ wo <6 
Rx m= 


i=j 
Hence 
(7.5) 7 > | (x; — w) Il dF (xm) = 0. 
j=l imj YR; m=1 


From this (7.1) follows. 
Suppose now that the conditions of Theorem 7.2 are fulfilled. We have 


E(Z, — nw) = Zz. | (o (2; — )) Ul dF’ (2m) 
j=l YR; m=1 


i=l 


2 2 


(7.6) Stl «- whee, 
m=1 


j=1 i=j YR; 
co j—l 


+92. 2. de I (x, — w)(x; — w) 0 dF (m). 


fai Sl dang 


Let s < j be any two positive integers. Then 


(7.7) = | (x, — w)(x; — w) [[ dF(am) = 0. 
i=j YR; m=1 
Hence 
ao j—l m= i 
(7.8) > | (x, — w)(x; — w) Il dF (am) = 0. 
jux2 sacl imj YR; m=1 
In a similar manner we obtain 
(7.9) Zz | (x; — w) [] dF(am) = o P{n > jj. 
i=mj YR; m=1 
From (7.6), (7.8), and (7.9) it therefore follows that 
(7.10) E(Z, — nw)? = 0? DY Pin > j} = & DY jP{n = j} = En 
j=1 j=l 
which is the desired result. 
It remains to prove the validity of rearranging the series in (7.3) and (7.6). 


First, we have 


m=% 


(7.11) x | | 2; — w| [] dF(an) = P{n > QRE|X —w|. 
i=j YR; m=1 


Th 
ser 


(7. 


all 


th 


ne 
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Hence it follows that 


LL ia wl Hare.) =X Pi n> gel x - wl 


(7.12) . 
= E|X —w| > jP{n = j} = E|X — w| En. 
j=l 


This justifies the rearrangement of terms in the series in (7.3). Second, the 
series (7.6) is dominated by the series 


oo eo 


> 4 w) TI dF (am) 


(7.13) 


co j—l 


+2002 f los wl-|2;- wT dP ten) 


j=2 sal i=j 


all of whose terms are positive. The series (7.13) converges because 
n 2 

(7.14) B (3 |x - w/) < +o. 
t=1 


Hence the rearrangement of the series (7.6) is valid. 
In the sequel we require certain sets R;(j = 1, 2, --- ) which we shall define 
now. Let Ri;,7 <j, be the totality of all points (a, --- , z;) such that 


(7.15) (xy a Zi) € R; . 
Let R’ be the j-dimensional Euclidean space. Then 
i 
(7.16) R; = R’ — Dd Ry. 
t=1 
We shall now prove: 


n 3 n 
THEOREM 7.3. Suppose that ely |r; — w | and Bn| 3 |x; — w | 


i= 


i= 


exist.” Then 
(7.17) E(Z, — nw)*® = w3En + 3c0°En(Z, — nw) 
where 

w; = E(X — w)* 


exists. 


n 3 
* The author has succeeded in proving that the existence of E < | rs — w| | implies 
i=l 


t=1 


the existence of E |» Zz | ti — w |. The proof will be published subsequently in con- 


nection with other results. 
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Proor: We have 
BZ, — mo) = f [Ew - w | TL arcen 
> ->/ a wT dF (tm) 


i=l 


+3>/ > ee wT dF (2p) 


j=2 j t=2 sal 


43D [ YE @— we w laren) 


j=2 j t=2 8 


+60 [ LES @ - we - we wT are.) 


7=3 i t=3 8=2 tl 


Considering the first term in the right member of (7.18), it follows that 


vI, if (x; — 0 [ere 


i=l 


> 


Pot (x; — wT dF (2m) 


0 


dws P{n > 3} 
iml 


= > iw; P{n = 21} = ws En. 
t=1 
All the rearrangements of terms in the operations involved in the proof of Theo- 
rem 7.3 are legitimate because the various series are absolutely convergent. 
As for the second term in the right member of (7.18), we have 


a a TL dF (tm) 


j t=2 sel 


eo 4) 


aa > 4, — w)(x; — wT dF (2m) 


> SE. (xy — w) TT aon) 
8enl i=s+1 4 R’ m=1 
=¢° SE — w) II dF (1m). 


8=l is YR 


We now operate on En(Z, — nw), and obtain 


En(Z, — nw) = > [ it@-w a dF (tm) 
(7.21) ——— 


=f. — w II dF (2). 


yy t=j 
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We observe that 


Xf ie; - wy TT aren) 


t=7 


- iz [, @- w IL aren) 


+ Sf @-w Il area. 


To evaluate the left member of (7.22), we proceed as follows: It is easy to see that 


(7.23) Zz _ fi ~ ¥) IT dF (tm) = 0. 
t=] t = 
Moreover, when s > j, 


a) Ef @-w aren) =f @-w I aren. 


os 


Hence 


(7.25) = i (2; — w) II dF (zm) = 3 [. (x; — w) II dF (xm). 
Therefore 


(7.26) En(Z, — nw) = . - (a; — w) II aren). 


j=1 s=j YR, 


It remains now to consider the third term of the right member of (7.18). 
We have 


, r (x, — w)* (x; — w) II dF (2m). 


; t=2 sal 


2 %~ - 


-L ELI @-w'@- wo aren). 


8=1 ims+l j=i 


Now, suppose that in the expression 
, 
(7.28) Ver = | @—w@ - w) IT dP) 
Rj ™Mm==1 


where j > 7 > s, we integrate with respect to all z,, for which m > 7. Then 
it is not difficult to see that 


(7.29) > Vas = 0 


j=t 


for all s and 7 such that 1 < s <7. Hence from (7. 27) 


(7.30) y } > : (x, — w)*(x; — w) II dF (tm) = 0. 


j t=2 8=1 
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In a similar way it is shown that the fourth term of the right member of (7.18) 
is zero. 

The desired result (7.17) is a direct consequence of (7.18), (7.19), (7.20), 
(7.26), and (7.30). 

Consider now an infinite sequence of chance variables x;, 22, --- , which 
need not have the same distribution and which may be dependent (in which 
case they must satisfy the obvious consistency relationships). We take suc- 
cessive observations on these chance variables and define a sequential process 
as above, which is subject only to such restrictions as we shall explicitly state. 
Let Z, maintain its previous definition. 


THEOREM 7.4. Suppose that 
exists for all positive integral z for which P {n > 1} ¥ 0. In those cases write 
(7.32) ¥ = E(\X;—»:|[n >. 
Suppose also that the series 
(7.33) Dori + +++ + ¥i)Pin = i 
converges. Then 
(7.34) B| 2. _ d| = 0. 

t=1 


It is regrettable but unavoidable that the mean values »; and 7; entering into 
(7.33) and (7.34) be conditional. The fundamental reason is that the sequential 
process may drastically modify the distribution of dependent chance variables, 
so that their distribution for our purposes can only be considered in conjunction 
with the sequential process itself. Consider the following example: 


PIXi=-l)=4 PIM=N =} 
P{X, = -2| Xi = -1} = 


nie 


P{X, = -1|X, = -1} = 


Nin 


P{X,=1|X,=H = 


Nl 


Ni- 
. 


We have E(X:.) = 0. Suppose we define the following sequential process: 
If X,; = —1,n = 1, andif X; = 1,n = 2. Itis then clear that for our purposes 
X-» can take no negative values and the fact that E(X,) = 0 is of no use to us. 
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If, however, the chance variables X, , X2, --- are independent, this difficulty 
disappears, and we have the following. 
CoroLuary | TO THEOREM 7.4. If the chance variables X; , X2, --- are inde- 


pendent, we have Theorem 7.4 with v; = E(X,), and vy; = E| X;— |. 

If further all the X; have the same distribution, we see that Theorem 7.1 is 
a special case of Theorem 7.4, since the convergence of the series (7.33) is then 
a consequence of the existence of w and En. From this argument we see, how- 
ever, that it is not necessary that all the X; have the same distribution, and we 
may write the following generalization of Theorem 7.1: 

CoroLLaARY 2 TO THEOREM 7.4. Let the X; be independent with, in general, 
different distributions. Suppose, however, that all v; are equal, and all v; are equal, 
except perhaps for those 7 such that P {n > 1} = 0. Suppose further that En exists. 
Then (7.1) holds. 

Among possible fields of application of Theorem 7.4 are sequential tests of 
composite statistical hypotheses, and the random walk of a particle governed 
by probability distributions which are functions of time and the position of the 
particle. The extension of this theorem to vector chance variables is straight- 
forward. The extension to higher moments may present difficulties. We hope 
to return to some of these questions in the future. 

Proor oF THEOREM 7.4. Thisis very elementary. We have 


B(. — >) = Sf lz (x4, — rd | dF (x, +++ , 2) 


tl j=l 
(7.35) = > a. (2; — v;) dF (2 ee &;). 
j=l imj YR; 
— 2, Pin = REX, = ey | n > j) = 0). 


The rearrangement of the series is valid because 


>» i | x; — vj | dF(a, +++ , 2) Di Pin > 3} 
Ry joa 


j=1 i=) ‘ 


2 


(7.36) . 
= 2 (1 +--+ + })P{n = 3} 


which converges by (7.33). 
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ESTIMATION OF LINEAR FUNCTIONS OF CELL PROPORTIONS 


By Joun H. SmitxH 


Bureau of Labor Statistics 


Summary. In this article certain contributions are made to the theory of 
estimating linear functions of cell proportions in connection with the methods 
of (1) least squares, (2) minimum chi-square, and (3) maximum likelihood. 
Distinctions among these three methods made by previous writers arise out of 
(1) confusion concerning theoretical vs. practical weights, (2) neglect of effects 
of correlation between sampling errors, and (3) disagreement concerning methods 
of minimization. Throughout the paper the equivalence of these three methods 
from a practical point of view has been emphasized in order to facilitate the 
integration and adaptation of existing statistical techniques. To this end: 

1. The method of least squares as derived by Gauss in 1821-23 [6, pp. 224— 
228] in which weights in theory are chosen so as to minimize sampling variances 
is herein called the ideal method of least squares and the theoretical estimates 
are called ideal linear estimates. This approach avoids confusion between 
practical approximations and theoretical exact weights. 

2. The ideal method of least squares is applied to uncorrelated linear func- 
tions of correlated sample frequencies to determine the appropriate quantity 
to minimize in order to derive ideal linear estimates in sample-frequency prob- 
lems. This approach leads to a sum of squares of standardized uncorrelated 
linear functions of sampling errors in which statistics are to be substituted in 
numerators. 

3. A new elementary method is used to reduce the sum of squares in (2)— 
before substitution of statistics—to Pearson’s expression for chi-square. In 
this result, obtained without approximation, appropriate substitution of sta- 
tistics shows that the denominators of chi-square should be treated as constant 
parameters in the differentiation process in order to minimize chi-square in 
conformity with the ideal method of least squares. 

4. The ideal method of minimum chi-square, derived in (3) as the sample- 
frequency form of the ideal method of least squares, yields ideal linear estimates 
in terms of the unknown parameters in the denominators of chi-square. When 
these parameters are estimated by successive approximations in such a way as 
to be consistent with statistics based on them, it is shown that the method of 
minimum chi-square leads to maximum likelihood statistics. 

5. An iterative method which converges to maximum likelihood estimates is 
developed for the case in which observations are cross-classified and first order 
totals are known. In comparison with Deming’s asymptotically efficient 
statistics, it is shown that, in a certain sense, maximum likelihood statistics 
are superior for any given value of n—especially in small samples. 

6. The method of proportional distribution of marginal adjustments is de- 
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veloped. This‘method yields estimates of expected cell frequencies whose 
efficiency is 100 per cent when universe cell frequencies are proportional—g 
condition closely approximated in most practical surveys for which first order 
totals are available from complete censuses. Whether this favorable condition 
is satisfied or not, the method yields results which are easy to interpret and it 
has many computational advantages from the point of view of economy of time 
and effort. 

Throughout the article discussion is confined to the estimation of parameters 
whose relationships to cell proportions are linear. However, most of the results 
can be extended to the case of non-linear relationships, the necessary qualifica- 
tions being similar to those in curve-fitting problems when the function to be 
fitted is not linear in its parameters. In this case, of course, least squares esti- 
mates are not linear estimates. In particular, obvious extensions of the general 
proofs in sections 5 and 6 make them applicable to the non-linear case. Thus 
even when relationships are non-linear, it can be shown that the method of 
minimum chi-square is the sample-frequency form of the method of least squares 
which leads (by means of appropriate successive approximations) to maximum 
likelihood statistics in sample-frequency problems. This principle which 
establishes the equivalence of the methods of least squares, minimum chi-square, 
and maximum likelihood greatly facilitates the integration and adaptation of 
existing techniques developed in connection with these important methods of 
estimation. 


1. Introduction. This article deals with problems of statistical estimation in 
which the parameters to be estimated are cell proportions or linear functions of 
them. A simple illustration of this type of problem is that of estimating p, 
the proportion of white men in a population classified by race and sex. Fom 
a sample of n persons selected at random from such a population, the desired 
proportion can be estimated by simply taking the sample proportion of white 
men as an estimate of the corresponding cell proportion in the population or 
universe. This estimate is unbiased for all possible values of p and its sampling 
variance is p(1 — p)/n—assuming, for simplicity, that sampling is done with 
replacements. Whether a more accurate unbiased estimate of p can be derived 
depends on whether or not any other relevant information concerning the cell 
proportions in the universe is available. For example, it may be known that 
all of the white portion of the population is composed of married couples so that 
in the universe the number of white men is exactly equal to the number of white 
women. This knowledge implies that half the proportion of whites provides an 
unbiased estimate of p which is far more accurate than the sample proportion 
of white men. In fact, the sampling variance of half the proportion of whites 
is equal to (2p)(1 — 2p)/4n—less than half the sampling variance of the pro- 
portion of white men. 

The term ideal linear estimate will be used to refer to any statistic which satis- 
fies the criteria of estimation implied by the foregoing discussion—that is, an 
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ideal linear esimate is any estimate which (1) is a linear function of the sample 
observations; (2) is recognizable as unbiased by the research worker; and (3) 
has minimum sampling variance among estimates which have properties (1) 
and (2). These important criteria of estimation will now be stated in more 
technical language. 

Let 1 , m2 , and nz represent the number of (1) white men, (2) white women 
and (3) non-white persons, respectively, in samples of n persons. Since any 
linear function with a constant term can be reduced to the homogeneous form 
by adding an appropriate multiple of the identity 


(1.1) m+ re +n —n = 0, 


it is possible, without loss of generality, to confine attention to linear estimates 
of the form 


(1.2) T = an, + ane + agns, 


which are recognizable as unbiased. In this example, the research worker is 
assumed to know that the cell proportions in the universe are 


(1.3) Pi, P2, Ps = P, Pp, 1 — 2p. 
Hence, absence of bias implies that the expected value of 7’ 
(1.4) E(T) = aynp, + agnpe + agnps3 

= (a, + a2 — 2a3)np + naz 


is identically equal to p; in other words, that 


(1.5) n(a; + a, — 2a;) — 1 = 0, 


and 
na3 = 0. 


The ideal linear esimate is derived by finding values of a; , a2, and a3 which 
minimize the sampling variance of 7 subject to equations (1.5) as side condi- 
tions.' In this way it can be shown that half the sample proportion of whites 
is actually the ideal linear estimate of p. For more general problems, the 
process of minimization of sampling variances with the aid of Lagrange multi- 
pliers involves expressions which are complicated algebraically. For this reason 
it is usually easier to derive ideal linear estimates of parameters which are linear 
functions of cell proportions by the ideal method of least squares which is 
presented in section 4. 

Like other least squares estimates, an ideal linear estimate of a linear function 
of cell proportions depends on ideal least squares weights. Since these weights 


‘In this example, it is possible to solve equations (1.5) for a2 in terms of a; , drop sub- 
scripts, and substitute in the formula for the sampling variance of 7 to obtain a quadratic 
in a to be minimized. 


234 JOHN H. SMITH 


are, in general, functions of variances and covariances of sample frequencies, 
the theoretical connotation of the term “ideal”? makes it preferable to other 
terms such as “optimum” and “best.’”? In this connection it should be em- 
phasized that (1) the sampling variance of linear estimates is insensitive to 
small errors in estimating ideal weights, and (2) the process of deriving practical 
approximations to ideal linear estimates automatically provides maximum 
likelihood estimates of the ideal weights. Thus the estimation of weights is 
perfectly objective and the best practical approximations to ideal linear esti- 
mates are expressed in terms of sample observations. This degree of objec- 
tivity is rare in statistical estimation as a brief consideration of regression prob- 
lems will illustrate. 

In ordinary regression problems, the ideal weights are inversely proportional 
to error variances. It is usually necessary to draw upon past experience to 
estimate relative weights because satisfactory estimates of error variances 
are rarely available in terms of sample observations. From the present point 
of view, the widespread use of equal weights implies the subjective “assumption” 
that all error variances are equal. (Maximum likelihood estimates of regression 
coefficients require, in addition, the even more subjective assumption of nor- 
mality.) In spite of these (usually implicit) subjective assumptions, dis- 
cussions of optimum properties of least squares regression coefficients based on 
ideal weights in terms of unknown parameters are highly commendable because 
(1) sampling variance is not very sensitive to small errors in weights and (2) 
properties of theoretical ideal linear estimates furnish a simple basis for dis- 
cussion of the properties of practical statistics based on any reasonably good 
approximations to the exact ideal weights. In any case, it is important to 
know what the ideal weights are in terms of unknown parameters because 
research workers can make better estimates if they know what quantities should 
be estimated than they could otherwise. 


2. Estimation of a single parameter. In sample-frequency problems, least 
squares weights are rarely given explicitly or even implied by information 
available to the research worker. Since the hypothetical example used in 
Section 1 is a trivial special case from this point of view, a more realistic ex- 
ample is presented in this section. Since the biological interpretation of this 
problem is presented in detail in all but the first of the many editions of Fisher’s 
well-known book [3] it is sufficient here to consider only the statistical problem. 
The four cell proportions are 


(2.1) Mi, Pe; Ps, mh = (2 + 6) /4, (1 mice 6) /4, (1 = 6) /4, 6/4, 


and the parameter @ is to be estimated from the set of sample frequencies 
(2.2) Ny, Me, Nz, Ne = 1997, 906, 904, 32, 


obtained in a sample of n = 3839 selected at random from an infinite universe. 
Fisher considers five different statistics—T7, , T:, T3, Ts, and T;—so it will 
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be convenient to use the symbol 7’, for the ideal linear estimate. Consider 
the class of linear unbiased estimates of the form 


(2.3) T = aym + ang + ayn; + aang, 
where absence of bias implies that 


and 
a, — @ — As Qa, — 4/n = 0. 


Minimizing the sampling variance of 7 in equation (2.3) subject to side 
conditions based on equations (2.4) vields the ideal linear estimate 7's defined 
by the equation 


(2.5) nl + 20)7, = 30n, — 30n. — 30nz3 + (4 — O)ns. 
The exact sampling variance of 7's, 


26) 2  20(1 — 0)(2 + 9) 
Gs) ” n(1 + 286) 


’ 


is used by Fisher as the asymptotic sampling variance of any efficient estimate 
of 6. The exact sampling variance of the ideal linear estimate is especially 
appropriate as the asymptotic sampling variance of the maximum likelihood 
estimate 7’; because 7’; is the limit of an iterative process designed to estimate 
T; as closely as possible from sample data by using successive approximations 
to 7’, for @ in equation (2.5). The limit of this process (which is, of course, 
only an approximation to 7's) can be obtained by substituting the symbol 7, 
for both 7, and @ in equation (2.5) and solving the resulting quadratic equation 
which can be reduced to 


2 7, 
nT, — (m — 2nz — 2n3 — na)Ty — 2ng = O, 


an equation which is identical, except for notation, with Fisher’s equation of 
maximum likelihood of which 7’; is the positive solution. 

The foregoing result is a comparatively simple illustration of the general 
principle that the maximum likelihood estimate of any linear function of cell 
proportions is the limit of an iterative process designed to approximate the 
corresponding linear estimate as closely as possible by means of sample fre- 
quencies. Since the accuracy of estimates of least squares relative weights 
increases with size of sample, maximum likelihood statistics have, in an asymp- 
totic sense for large samples, the same optimum properties which are possessed 
in an exact sense (even for small samples) by the corresponding ideal linear 
estimates. Thus the results obtained by means of the theory of large samples 
are supported by the approach to estimation problems by means of ideal linear 
estimates. In addition, the later approach facilitates the integration of 
available techniques as explained in later sections. 
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It is true that the optimum properties of maximum likelihood statistics ean 
be presented in terms of the theory of large samples, but the fact that a given 
method of estimation yields a statistic whose asymptotic sampling variance is 
a minimum does not imply that the same technique will yield a minimum 
variance statistic for any given small value of n. For example, it is well known 
that the median is a maximum likelihood estimate of the midpoint of a double 
exponential universe. Nevertheless, in samples of three observations from 
such a universe, another statistic—4/9 of the mean plus 5/9 of the median— 
has greater relative advantage over the median than the median has over the 
mean. 

Fisher’s discussion of the relative efficiencies of his five alternative consistent 
statistics suggests that it is impossible to formulate objective criteria for making 
choices among alternative statistics such that each statistic will be used whenever 
its sampling variance is smallest. Consider the sequence of universes generated 
by letting @ vary from zero to unity. In general, each value of 6 would deter- 
mine which of Fisher’s five statistics would have smallest sampling variance 
for that particular universe for any given value of n. In comparison with 
any other single statistic, the statistic 7, would usually have smaller sampling 
variance, but there are notable exceptions. For example, in the absence of 
linkage when @ is equal to one-fourth, the statistic T2 is the ideal linear estimate 
and its sampling variance is smaller than that of 7';—at least for certain small 
values of n. For this reason, Fisher used 7. in preference to 7'; as the basis for 
testing the significance of linkage. The statistic T7;—derived by Fisher’s method 
of minimum chi-square—is also of special interest. Fisher’s method of minimum 
chi-square yields statistics which differ from the corresponding maximum 
likelihood statistics because Fisher considers the denominators as variables in 
the process of differentiation instead of considering them as unknown para- 
meters to be estimated by identifying them with the corresponding statistics 
in the numerators after differentiation. Arguments of later sections tend to 
show that the latter method is more appropriate. In this example, it can be 
shown that if 7’; were substituted for the corresponding parameter in the de- 
nominators of chi-square (and treated as a parameter) the minimization of chi- 
square with respect to statistics in its numerators would be exactly equivalent 
to substituting 0.035785, the numerical value of 7; for 6 in equation (2.5) and 
solving for 7’, to obtain 0.035717, a value which is much closer to 0.035712, 
the numerical value of the maximum likelihood estimate 7, than to Fisher’s 7; . 
In problems of estimation chi-square should be minimized in order to obtain 
efficient statistics—not to obtain a small criterion for testing goodness of fit— 
and it should be minimized in a manner consistent with this purpose. Whether 
or not it is possible to derive an even smaller value for a quantity called chi- 
square should be considered to be irrelevant in either estimation problems or 
tests of significance. It is difficult to present these ideas in more technical 
language because it is possible to construct trivial hypothetical universes for 
which Fisher’s method of minimum chi-square provides statistics which are 
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superior in certain respects to the corresponding maximum likelihood statistics. 
Nevertheless, it seems clear that the ideal linear estimate usually has smaller 
sampling variance than the maximum likelihood statistic which, in turn, usually 
has smaller sampling variance than any other given practical statistic. Evi- 
dence presented in later sections tends to show that these advantages are more 
important in small samples than in cases in which the theory of large samples 
is applicable. 


3. The “ideal’’ method of least squares. When sample observations are 
uncorrelated in successive samples and parameters to be estimated are linear 
functions of the expected values of the sample observations, the method of least 
squares yields ideal linear estimates of the parametes provided that the weight of 
each observation is inversely proportional to its variance in successive samples. 
Although the minimum sampling variance property among linear unbiased 
estimates is seldom stressed, this principle of weighting has been presented in 
connection with the method of least squares for more than a hundred years. 
In order to emphasize the theoretical nature of weights which depend on vari- 
ances which are usually unknown in practice and to distinguish the method 
based on such weights from the more familiar method of least squares with 
equal weights, the method which yields ideal linear estimates will be called the 
ideal method of least squares. 

Discussion of the general problem of estimating linear functions of cell pro- 
portions can be facilitated by making use of results obtained by other writers— 
notably Gauss (as reported by Whittaker and Robinson [6]) and Pearson [4]. 
According to Whittaker and Robinson, “the first writer to connect the method 
[of ideal least squares] with the theory of probability was Gauss” [6, p. 224]. 
In his Theoria Motus proof of 1809, Gauss derived the “most probable value” 
(6, p. 223] of a parameter (i.e., the statistic which satisfies the criterion now 
called maximum likelihood) for the case in which sample observations are sta- 
tistically independent and normally distributed. In his Theoria Combinationis 
proof of 1821-23, Gauss “abandoned the ‘metaphysical’ basis” [6, p. 220] of 
his earlier work and derived the method herein called the ideal method of least 
squares (without approximation) from the criteria of (1) minimum variance and 
(2) absence of bias for the case in which “the mean value of [the covariance of 
a pair of errors] is zero” [6, p. 224]. Since the covariances of uncorrelated linear 
functions are zero whether they are statistically independent or not, it follows 
from the work of Gauss that the ideal method of least squares applied to un- 
correlated linear functions of sample frequencies yields ideal linear estimates. 
In other words, the ideal method of least squares implies the following six steps: 

1. From the set of k + 1 sample frequencies construct k linear functions 

which are uncorrelated in successive samples. 

2. From each function subtract its expected value in terms of the unknown 

parameters to find its sampling error. 
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3. Write the ratio of each sampling error to its own standard error in the 
form of a fraction. 

4. Sum the squares of these standardized uncorrelated sampling errors to 

obtain a quantity called chi-square. 

Substitute statistics” for the parameters in the numerators of chi-square. 

6. Minimize the sum of squares of residuals with respect to each statistic 

in turn (subject to appropriate side conditions in case linear functions 
not implied in preceding steps are known). 

This series of six steps can be summarized by the single statement that the 
function to minimize is the sum of squares of standardized uncorrelated resid- 
uals. Actually this statement is oversimplified because even though sampling 
errors are both uncorrelated and standardized, the corresponding residuals 
are, in general, neither standardized nor uncorrelated. 


- 


4. Pearson’s expression for chi-square. As defined by Pearson [4], chi- 
square is the sum of squares of a set of k standardized uncorrelated linear func- 
tions of sampling errors in a set of k + 1 correlated sample frequencies. A set 
of k standardized uncorrelated linear functions can be constructed in an infinite 
number of ways, but each set can be obtained from any of the others by means 
of an orthogonal transformation. Thus the sum of squares is the same no 
matter what set is originally chosen. As his set of standardized uncorrelated 
linear functions, Pearson chose those determined by the axes of the correlation 
ellipse for which he gave the required sum of squares in terms of “minors” or 
cofactors of the correlation determinant of the first k sample frequencies. Pear- 
son reduced this com="icated expression to the now familiar form 

k+1 


(4.1) x = 2. (ni — np;)’/Npi, 


i=1 


where 7; is the proportion in the 7th cell in the universe and n; is the frequency 
in the 7th cell of a sample of n observations selected at random from an infinite 
universe (or with replacements from a finite universe). 

The widespread misunderstanding of the nature of chi-square seems to be 
based primarily on the facts that 

1. Pearson’s rule for degrees of freedom is inadequate (see section 5), and 

2. Pearson’s expression for chi-square can be derived by approximate methods 

as well as by exact methods. 

Pearson’s derivation of the expression for chi-square by exact methods is suf- 
ficient to show that its derivation by approximate methods involves a paradox 
in which different sets of approximations offset each other; however, Pearson’s 
article is relatively inaccessible and, in addition, his algabraic reductions involve 





2 It is convenient to call these variable symbols ‘‘statistics’’; the quantities whose 
squares are summed, “‘residuals’’; and the whole expression ‘‘chi-square,’’ even though, 
from a certain point of view, these terms are strictly applicable only after the minimiza- 
tion process. This usage should always be clear from its context. 
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the minors of a general determinant of the kth order. For these reasons, the 
following exact derivation is presented in terms of elementary algebra. 

Since the sum of squares is the same for any set of k standardized uncorrelated 
linear functions of the sampling errors in k + 1 correlated frequencies, a set should 
be chosen for which the algebraic reductions are as easy as possible. From this 
point of view a satisfactory set, which can be written in any of three forms, is 
given by 


(4.2) Yi 


PiNi+ — Pi+ns 
= Pit — Pirei 
= —pe- — (pi + dire: 


where e; = n; — np; and 7+ and z— refer to classes formed by combining all 
classes above the zth class and below the 7th class, respectively. 

By means of the known variances and covariances of the sample frequencies 
in expected value form, 


(4.3) E(e;) = npdl — pi), 
and 

(4.4) E(ee,) = —npip;, 

it can be shown that the variance of y; is 

(4.5) E(y:) = npipis(pi + pis), 


and, by using the third expression in equation (4.2) for y; and the second for 
y; , it can be shown that any pair of y’s are uncorrelated because 


(4.6) E(yy;) = 90, (¢ < 9). 


Let z; represent the variable y; expressed in standard-deviation units. The 
square of this standardized uncorrelated linear function of correlated sampling 
errors can be written 


2 
(4.7) J = (Dili — Dir 6) 
npi Pi+(pi + pis) 
It remains to show that Pearson’s expression for chi-square can be obtained 
° io ° - ° Ss ° 
by adding the & values of 2; in succession. For this purpose it is convenient 
to define 


Yr 


(4.8) c= + st, 
i= NPi Ney 
obtained by combining all classes above the rth class. 
When r = k, the expression in equation (4.8) is the expression to be derived. 
It remains to show that x; is the sum of squares of k standardized uncorrelated 
linear functions of sampling errors; i.e., 
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k 
(4.9) xi = a SG. 
For the first cell e:4. = —e, and p34 = 1 — p:. Hencey,reduces to the negative 
of the error in the first frequency and 
(4.10) xi = e1/npi(1 — pr) 
= ei/npr + €14./Npr+ (n+ =1- p1), 


a special case expressed in the required form. The general case is established 
by showing that 


(4.11) Xr-1 + 2 = xr, 
or, alternatively, that 


2_ 3 2 
me = Xe — Xe 


e;/np, + e4./N pry — (er + €r+)’/n(pr + Drs) 





(4.12) _ (Den ee + Pr ere) (De + Pre) — Dr Des (Cr + Bey Cr+ + Crs) 
NDr Pr+ (Pr + Pr+) 
= Peers — 2rPrs eres + Droer = (Prlrx — Pre er)” 
Npr Pr+(Pr + Pr+) 


thus establishing the derivation of Pearson’s expression for chi-square. 
When sampling is done without replacement each variance and covariance 


) 


is multiplied by (NV — n)/(N — 1) where N is the number of observations in 
the universe. Hence, chi-square for this case can be written 

: 2 N-1 a, e; 

(4.13) x =— 


N — nN i=l NY: 


This expression shows that the factor involving sampling errors is the same 
whether sampling is done with replacement or without replacement. Hence, 
the derivation of least squares statistics is the same for either method of sampling, 
but sampling variances for the simpler case are multiplied by the factor (N — n)/ 
(N — 1) when sampling is done without replacement. 


5. The method of minimum chi-square. The derivation of Pearson’s ex- 
pression for chi-square completes first four steps of the ideal method of least 
squares outlined in section 3. Hence, the method of minimum chi-square is 
the sample-frequency form of the ideal method of least squares in which only 
two of the six steps remain to be taken. 

In his original article [4] Pearson pointed out that the use of statistics instead 
of parameters would affect the value of chi-square but that such effects would 
usually be so small that no allowance need be made for them in connection with 
tests of significance. It is now well known that the average value of chi-square 
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is reduced approx'mately one unit for each parameter estimated from the sample, 
and that the main portion of this effect is on the numerators; i.e., in large samples 
the effect of substituting statistics, for parameters in the denominators usually 
has a negligible effect on the value of chi-square. By confining the discussion 
to the case in which parameters are used in the denominators, it is possible to 
make simple exact statements concerning the main effects in terms of the number 
of squares of standardized uncorrelated linear functions—also known as the 
number of degrees of freedom and the mean value of chi-square. 

When the expected values in the numerators of chi-square can be expressed 
as linear functions of r algebraically independent parameters, ideal linear esti- 
mates of the r parameters are determined by substituting statistics for the r 
parameters and minimizing the resulting expression wth respect to each sta- 
tistic. In general, such a substitution of statistics for parameters in the numer- 
ators of chi-square reduces the number of degrees of freedom by one unit for 
every parameter estimated; that is, the appropriately minimized chi-square 
can be analyzed into k — r squares of standardized uncorrelated linear functions 
of sampling errors. 

The r ideal linear estimates are linear functions of the sample frequencies. 
Let (v1, v2, -*-, ¥-) be a set of standardized uncorrelated linear functions of 
the correlated sampling errors in these statistics and let (v; , ve, ---, v,) be a set 
of linear functions obtained from the z,’s of section 3 by an orthogonal trans- 
formation. Since the sum of squares is not changed by such a transformation, 
chi-square is the sum of the k values of v;. The process of substituting statis- 
tics for the r parameters in the numerators of chi-square reduces the values of 
the first rv;’s to zero without affecting the values of the other (k — r)v?’s. 

Thus the appropriately minimized chi-square can be analyzed into k — r 
squares of standardized uncorrelated linear functions of sampling errors and is 
therefore said to have k — r degrees of freedom. The mean value of each square 
is the variance of a standardized linear function of sampling errors and is there- 
fore unity by definition. Hence the mean value of the appropriately minimized 
chi-square (with parameters in the denominators) is exactly k — r when r 
statistics are estimated from a set of k + 1 sample frequencies. 

The expression to be minimized is 


4\2 
(5.1) x* =>, (ni m;) 
NDPi 
where m; is the ideal linear estimate of np;. The set of statistics described 
by the equation 


(5.2) mM; = Ns , 


reduces the value of chi-square to zero—its minimum value. This shows that 
the sample cell proportion is the ideal linear estimate of the corresponding 
parameter. 


Whenever a linear function independent of the sum of the cell proportions is 
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known, it is possible to take advantage of additional information provided by 
the known function by minimizing chi-square subject to an appropriate side 
condition. When side conditions are used in this way, the number of degrees 
of freedom for the minimized chi-square is equal to the number of side conditions 
which are algebraically independent of each other (and of the sum of the cell 
proportions). Let the known linear function be written 


(5.3) Lanp; — m = 0. 


In order to facilitate comparison of the typical equation of maximization 
with the corresponding equation of the method of maximum likelihood, it is 
convenient to minimize chi-square by maximizing — x’/2 subject to a side 
condition based on (5.3). The function to be maximized can be written 


(5.4) —-x'/2 = Xn; - m;)"/(—2np3) + h(Sam; — m), 


where h is a Lagrange multiplier. Setting the partial derivative of — */2 
. , . ° ° — ee . 

with respect to m; equal to zero, the typical equation for minimizing chi-square 

can be written 


(5.5) (n~ — m;) /npi + ha; = 0, 


a form which shows that, in general, ideal linear estimates are defined in terms 
of unknown parameters. Fortunately, these parameters can usually be approxi- 
mated closely by an iterative process. Substituting m; for both np; and m; 
in equations (5.5) the typical equation in the limiting values of such a process 
can be reduced to 


(5.6) ni/m; — 1 + ha; = O, 


a form which is identical with the typical equation (6.6) of maximum likelihood 
derived in section 6. This equality of typical equations implies that whenever 
the denominators of chi-square are estimated in such a way as to be consistent 
with least squares statistics based on them, the method of minimum chi-square 
always leads (by means of approximations necessary in practice) to maximum 
likelihood estimates of parameters which are linear functions of cell proportions. 


6. The method of maximum likelihood. Maximum likelihood estimates of 
linear functions of cell proportions can be obtained by (1) expressing the prob- 
ability function (general term of ‘the multinomial expansion) in terms of the r 
parameters to be estimated; (2) substituting r statistics for the r parameters; 
and (3) maximizing with respect to the r statistics. In practice, this is usually 
accomplished by maximizing the logarithm of the variable factor in step (3) 
which can be written, 


(6.1) L = Xn; logm;, 


where m; is the maximum likelihood estimate of np;, the expected value of the 
ith frequency n; in a sample of n observations classified into (k + 1) classes or 
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cells. It is evident that L as written has no maximum with respect to any m; 
since it increases without bound as m; increases, but it sometimes has a uniquely 
determined maximum when each of the m,’s is written explicitly in terms of 
less than k + 1 algebraically independent statistics. In the general case it is 
easier to maximize L subject to an appropriate set of side conditions, one of 
which must be equivalent to 


(6.2) m + m+ ++: + mea — n = 0. 


When no linear function except the sum is known, the likelihood function 
can be written 


(6.3) L = Xn;logm; — (Im; — n), 


a function which, subject to equation (6.2), is always equal to that in equation 
(6.1) but which has a uniquely determined maximum. The typical equation of 
maximum likelihood, obtained by setting the partial derivative of L with respect 
to m; equal to zero, is 


(6.4) n/m; -—-l= 0, 


an equation which shows that each sample frequency is a maximum likelihood 
estimate of its own expected value. 

When a linear function such as that in equation (5.3) is known, an improved 
set of maximum likelihood statistics can be found by maximizing 


(6.5) L = Xn,;logm; — (2m; — n) + h(Zaym; — m). 
The typical equation of maximization is found to be 
(6.6) n;/m; — 1+ ha; = 0, 


an equation which, as stated above, is identical with equation (5.5). Since 
equation (5.5) was obtained as the limit of an iterative process from the typical 
equation (5.4) for minimizing chi-square subject to the same side condition 
and since each additional side condition affects the typical equation of each 
method in exactly the same way, the method of minimum chi-square and the 
method of maximum likelihood are equivalent for the general case in the sense 
that the method of minimum chi-square always leads to maximum likelihood 
statistics as limits of an iterative process. 


7. Second-order tables with known expected marginal totals. As stated in 
section 2, the integration of available techniques is facilitated by regarding 
maximum likelihood statistics as the best practical approximations to the 
corresponding ideal linear estimates. Since this important principle may not 
be immediately obvious, it will be illustrated for the important special case of 
second-order tables for which the expected marginal totals are known. 

Consider a sample of n observations arranged on two bases of classification 
and presented in a table containing r rows and s columns. The universe of N 











244 JOHN H. SMITH 


observations has been completely enumerated and classified on each basis 
separately but not cross-classified; i.e., universe totals of first order classes are 
known. 

For the cell in the 7th row and the jth column, let p;; represent the universe 
cell proportion; n;;, the sample frequency; np;;, the expected value of n,;; 
and m;;, the maximum likelihood estimate of np;;. Indicating summation 
by substituting a dot for the letter over which summation is to be performed, 
the known marginal totals satisfy the equations 


(7.1) Np:. — Ni. = 0, 
Np.; = N.; = 0, 


where p;, and p.; are the universe proportions and N;, and N_; are the known 
universe totals in the 7th row and the jth column, respectively. 

When 7 observations of a random sample are arranged according to two 
bases of classification in a table with r rows and s columns for which the r + s 
marginal totals are known, the typical equation of maximum likelihood can 


be obtained by maximizing, subject to side conditions based on equations (7.1), 
the likelihood function 


(7.2) L = Y2n;;logm;; — La(m;, — nj.) — Thi(m,; — n.,;), 


with respect to the maximum likelihood estimates m,;, where a; and b; are typical 
Lagrange multipliers. Setting the partial derivative with respect to m;; equal 


to zero and transposing, the typical equation of maximum likelihood can be 
written 


(7.3) nij/mi; = a; + b;. 


Since equations (7.3) are not linear in their unknowns, the reader’s first 
reaction might well be to agree with a certain anonymous critic that “their 
solution is difficult.’”’ This impression of great difficulty is probably the chief 
reason that previous writers have not used the method of maximum likelihood 
for this type of problem even after they had developed a set of techniques ade- 
quate for the solution of the equations of maximum likelihood. In other words, 
all that was needed was the integration of available techniques as will now 
be shown. 

In 1940, Deming and Stephan [2] derived a set of normal equations for the 
adjustment of a set of second-order cell frequencies to known expected marginal 
totals by the method of least squares in which each sample frequency is weighted 
by its own reciprocal. This method yields statisties which are efficient according 
to the theory of large samples, but they do not satisfy the criterion of maximum 
likelihood exactly. In the same article was presented an easier method of 
iterative proportions, which, unfortunately, does not yield least squares sta- 
tistics. In 1942, Stephan [5] developed an improved iterative process which 
yields statistics which satisfy the criterion of least squares with arbitrarily 
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chosen weights. The foregoing developments are presented in greater detail 
in Deming’s book [1] in which Deming adapts Stephan’s iterative method to 
the particular case in which each sample frequency is weighted by its own 
reciprocal so as to yield solutions for the normal equations derived in the joint 
article [2]. 

In Deming’s notation, equation 8 of Stephan’s article [5, p. 169] can be written 


(7.4) mii = e(pitq;—-l + n;, 


an expression obtained by substituting c;; for np;; in the denominators of chi- 
square and minimizing with respect to the statistics in the numerators. Hence, 
if exact values of the np;; were used for the c;;, the Stephan iterative method 
would yield ideal linear estimates. Unless these parameters are implied by 
some hypothesis to be tested, it is necessary, in practice, to estimate the np,; 
from sample data. In order to secure maximum likelihood estimates of expected 
cell frequencies by means of the Stephan iterative method, the adjusted fre- 
quencies based on first approximations to the c;; should be used as second ap- 
proximations to the c;;, etc. In this way, maximum likelihood statistics can 
be derived to any desired degree of approximation. At this point it should 
be emphasized that the preceding statement applies not only to the class of 
problems considered in this section but also to the wider class of problems for 
which the Stephan iterative method provides solutions. 

Unfortunately, theoretical discussions of previous writers contain confusing 
compensating errors which (1) present their own methods in an unnecessarily 
unfavorable light and (2) increase the difficulties involved in the introduction 
of the improvements in techniques suggested in section 9 which involve some 
degree of adaptation of techniques already available. For these reasons, it 
seems necessary to follow the arguments of previous writers in order to show 
the points at which improvements are needed. This can be done most effec- 
tively in connection with Deming’s book [1] where the method of least squares 
is presented in great detail. 

For the special case in which the sampling errors in the observations are un- 
correlated, the ideal criterion of least squares implies that the weight of each 
observation should be inversely proportional to its sampling variance. This 
criterion is accepted as well known by Deming who says that “the principle of 
least squares requires the minimizing of the sum of the weighted squares of the 
residuals” [1, p. 14] where “the weights of two functions are inversely pro- 
portional to their variances” [1, p. 22]. Deming assumes that “there is no 
correlation between the errors in the observations” with the qualification that 
“this assumption covers a wide class of problems, but does fail to cover some.” 
[1, p. 49]. This assumption of uncorrelated errors is not applicable to sample- 
frequency problems, of course, because the sample frequencies are correlated 
with each other in such a way that the reciprocals of the ideal least squares 
weights are not proportional to the sampling variances np;j;qi; but rather to 
the expected frequencies np;; which appear in the denominators of chi-square. 
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In this connection it is interesting to note that Deming himself insists that 
“there is only one principle of least squares, namely, the minimizing of x”.” 
[1,p.51]. However, the method currently in use for the minimizing of chi-square 
was that given by Fisher [3] which leads to equations which are difficult to solve 
even for such a simple example as the one presented in section 2 above. 

Deming and Stephan are to be commended for seeking an easier method 
but there is no justification (even as a device for saving effort) for their modifica- 
tion of the “principle of least squares” so as to imply erroneously that 

(1) weights of correlated sample frequencies are inversely proportional to 

their variances, and 

(2) sample frequencies are, in general, approximately proportional to their 

own sampling variances. 

Strangely enough, these two errors were applied in combination by Deming and 
Stephan to obtain good practical approximations to the ideal least squares 
weights. It might be argued that the second misleading implication is really 
not an error because it is offered as a simplifying approximation, but it is an 
integral part of both the normal equations approach in the joint article [2] 
and Deming’s adaptation [1] of the Stephan iterative method; that is, in each 
case the method would have to be revised if better approximations to the ideal 
least squares weights were used. More explicitly, Deming (1) uses n,; for Ste- 
phan’s c;; in equation (7.4); (2) identifies it with the other n;; in the same equa- 
tion; and (3) reduces the equation to a different form thus effectively preventing 
the use of successive approximations to the c;; without returning to Stephan’s 
iterative method in the general form given by equation (7.4) above which 
Deming does not present at all. Results of the joint article [2] are quoted by 
Stephan [5] without any explanation of the nature of the errors, but none of 
these results are used in the development of his iterative method which as noted 
above, is applicable to any arbitrarily chosen set of weights. The fact that 
Stephan corrected the second error without correcting the first implies that the 
weights he actually used are unsatisfactory. In Deming’s adaptation of the 
Stephan iterative method, a much better set of weights is obtained, not by cor- 
recting the first offsetting error overlooked by Stephan, but by resurrecting the 
second offsetting error which Stephan had corrected. Since this error is an 
integral part of Deming’s adaptation, Deming’s theoretical discussion implies 
that his own efficient statistics are only rough approximations which are definitely 
inferior to the inefficient statistics obtained by means of the weights chosen by 
Stephan. These inconsistencies are most clearly brought out by Deming when 
he says: 

“Strictly, in random sampling, the reciprocal of the weight of ni; is npijqi; , which is 
nearly equal to ni;qi; where p and q have their usual connotations. But since factors pro- 
portional to the weights may be substituted for them, it is sufficient to use nj; as the re- 
ciprocal of the weight in cell 7j, since the values of qi; do not usually vary much over the 
table.’”’ [1, p. 102.] 

In any given problem the seriousness of the error in the first statement in 
the foregoing quotation depends on the variation among the q;;’s._ In the par- 
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ticular example used by Deming the error is of considerable importance because 
the largest g;;is more than 40 per cent larger than the smallest q;;. The weights 
actually used by Deming agree with weights implied by the ideal method of least 
squares except for sampling errors in the n;; ; hence, the error in any relative 
weight converges stochastically to zero so that Deming’s statistics are efficient 
according to the theory of large samples. The efficiency of Deming’s statistics 
is inconsistent with the theory presented by Deming which implies erroneously 
that efficiency of estimation depends on approximate equality of cell proportions. 
If this argument were true it would apply also to the method of maximum 
likelihood and all other methods which yield efficient practical statistics in 
sample-frequency problems. The foregoing discussion, together with the results 
of section 8 show that the theory as presented by Deming has the following 
seriously misleading features: 

(1) it is based on a paradox in which a good final result is obtained by means 

of compensating errors; 

(2) it presents his efficient statistics in an unnecessarily unfavorable light; 

(3) it emphasizes the irrelevant condition of approximate equality of universe 

cell proportions; 

(4) it fails to mention the important condition of proportionality by rows 

and columns; and 

(5) it makes least squares, minimum chi-square, and maximum likelihood 

seem to be competing alternative methods. 

Of these undesirable characteristics, the last two are probably the most serious 
because they make the effective integration and adaptation of statistical tech- 
niques more difficult. As has been shown in sections 4, 5, and 6, the sample- 
frequency form of the ideal method of least squares is the method of minimum 
chi-square which always leads (by means of appropriate practical approxima- 
tions to unknown weights) to maximum likelihood statistics ; in other words, 
the methods are equivalent from a practical point of view. 

Since the ideal method of least squares based on the unknown np;; determines 
fully efficient, but theoretical, ideal linear estimates, the efficiency of practical 
approximations to ideal linear estimates depends on the accuracy with which 
the denominators of chi-square are estimated. For the unknown denominators 
np;:;, Deming uses the sample frequencies n;; while the method of maximum 
likelihood implies the use of the corresponding maximum likelihood estimates— 
statistics which, in general, have smaller sampling variances. The foregoing 
argument suggests that maximum likelihood statistics are slightly superior to 
Deming’s statistics for any given finite Value of n and that their relative ad- 
vantage increases as the sample size decreases. In large samples both methods 
yield efficient statistics because the relative errors in the weights implied by 
either method converge stochastically to zero as n increases. Although the ad- 
vantage of maximum likelihood statistics over Deming’s statistics is unim- 
portant except in small samples, it can be shown that Deming’s choice of weights 
leads to imperfectly compensated negative errors of estimation even in his 
large sample of 33,837 observations. 
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Deming weights each sample frequency by its own reciprocal. Positive errors 
of sampling decrease the value of the reciprocal and thus increase the absolute 
size of the required negative adjustments. Negative errors of sampling increase 
the value of the reciprocal and thus decrease the size of the positive adjustment. 
Thus every error of sampling (either positive or negative) leads to a negative 
error of estimation due to inappropriate weighting. Because the sum of all 
adjustments must be zero, these negative errors of estimation are compensated 
on the average but more or less imperfectly. The net effect of this imperfect 
compensation of negative errors of estimation is that Deming’s statistics are 
too small in those cells in which the relative adjustments (either positive or 
negative) are large, and vice versa. In a preliminary draft of this article, 
this type of error of estimation was studied by comparing Deming’s statistics 
with the corresponding maximum likelihood statistics in conection with Deming’s 
example involving 33,837 observations. Although errors of estimation of the 
type under discussion are apparent, they are, of course, extremely small in such 
a large sample. For this reason the large-sample comparson has been deleted 
in favor of simple hypothetical examples designed to throw light on similar errors 
of estimation in statistics derived by Fisher’s method of minimum chi-square 
as well as in those derived by Deming’s adaptation of Stephan’s iterative 
method. 

Consider a set of sample frequencies in a two-by-two table for which all 
expected marginal totals are equal. For this special case, the cell proportions 
on each diagonal are equal and the ideal linear estimate (which is also the 
maximum likelihood estimate) of any cell proportion is the mean of the two 
sample cell proportions on its diagonal. For the same case, Deming’s adaptation 
of the Stephan iterative method yields an estimate for each cell which is pro- 
portional to the harmonic mean of sample proportions on its diagonal while 
Fisher’s method of minimum chi-square yields estimates proportional to the 
corresponding quadratic means. 

As a numerical example of the foregoing problem consider the set of fre- 
quencies 


(7.5) Ny , Mig, Na , Me = A. 4, 3, an 


obtained in a sample of 10 observations selected at random from a universe 
in which the cell poportions are known to be 


(7.6) Pu, Piz, Pa, Px = p,0.5 — p, 0.5 — p, p. 


As estimates of the parameter p, the ideal linear estimate is .15, Deming’s 
adaptation of the Stephan iterative method yields .14, and Fisher’s method of 
minimum chi-square yields .1545 to four decimal places, the other two estimates 
being exact. The results illustrate the imperfectly compensated errors of 
estimation explained previously. The two sample frequencies on the principal 
diagonal (ni: and ne) have greater relative dispersion than the frequencies on 
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the other diagonal. For this reason, the relative adjustments made by Deming’s 
method are greater and according to the principle of imperfectly compensated 
negative errors of estimation, the estimate of p obtained by Deming’s method 
is smaller than the ideal linear estimate of p. Fisher’s method of minimum 
chi-square yields an estimate of p which is greater than the ideal linear estimate. 
In fact, one should usually expect imperfectly compensated errors of estimation 
in statistics derived by Fisher’s method of minimum chi-square to be opposite in 
sign and about half as large as those in the corresponding statistics derived by 
means of Deming’s adaptation of the Stephan iterative method. 

At this point, it should be emphasized that Fisher does not recommend his 
own method of minimum chi-square in preference to the method of maximum 
likelihood. In fact, he presents the theory of estimation in such a way as to 
imply correctly that the method of maximum likelihood is superior, especially 
in small samples. Other writers have noted the small differences between 
equations of maximum likelihood and those for minimizing chi-square by Fisher’s 
method and some have even derived one set of equations from the other by 
neglecting higher order terms in a Taylor series expansion. These derivations 
are of no interest here because they seem to justify the method of maximum 
likelihood as a simple approximation to some more complicated method. This 
type of justification is both unnecessary and undesirable. It is more useful to 
regard the method of maximum likelihood as an approximation to a method— 
least squares—for which the theory is simpler. 

Skeptical readers who find the foregoing argument unconvincing may be able 
to profit from the following example. Consider the problem of estimating the 
parameter p where 2p is the proportion of white balls in an urn. A sample of 10 
balls is selected and classified by the following process. Each white ball is 
placed in one of the cells on the principal diagonal of a two-by-two table, the 
particular cell being decided by the toss of a coin. Asimilar methed is used for 
non-white balls placed in cells on the other diagonal. Assuming that the results 
of this process are given by equation (7.5), which of the three alternative esti- 
mates of p given above should be preferred? Belief in the general superiority 
of Fisher’s method of minimum chi-square seems to imply that the device of 
coin-tossing described in this example can be used in practical problems involving 
the estimation of the proportion of ‘‘suecesses’”’ to secure estimates which are 
superior to the sample proportion—the ideal linear estimate in such cases. 
Even if it is possible to construct trivial special case examples supporting some 
complicated method for such problems the general use in practical problems of 
the coin-tossing device in connection with either Fisher’s method of minimum 
chi-square or Deming’s adaptation of the Stephan iterative method would be 
absurd as this example is intended to emphasize. 


8. The method of proportional distribution of marginal adjustments. The 
method of proportional distribution of marginal adjustments is a general method 
of adjusting sample frequencies so that their row and column totals agree with 
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known expected marginal totals. In other words, the adjusted frequency for 
the cell in the 7th row and the jth column is given by the equation 
















(8.1) mij = nj — Did.3 — p53 d:., 





where 
d;. 


and 







d.; = Mf — Biss 


are the net adjustments in the sample cell frequencies of the ith row and the 
jth column, respectively. The asterisk is used to distinguish maximum likeli- 
hood estimates m;; and the ideal linear estimates m; ; from the set of statistics 
based on equation (8.1). 

The method of proportional distribution of marginal adjustments yields ideal 
linear estimates when the universe cell proportions are proportional by rows and 
by columns; i.e., when 








(8.2) Dis = Pi.p.i- 


This important principle can be established by substituting in equation (7.4) 
of section 7 the quantities 


(8.3) Cij = NPip.;, 
R= 0.5 + d;./npi. ? 





















and 


q; = 0.5 + d_.,/np.;, 


and reducing the typical equation of the ideal method of minimum chi-square 
to the form of equation (8.1) which defines the method of proportional dis- 
tribution of marginal adjustments. 

Even in the absence of exact proportionality, under which it yields fully 
efficient statistics, the method of proportional distribution of marginal adjust- 
ments has the following relative advantages over other available methods: 

(1) ease of extension to tables of higher order; 

(2) exact agreement with known (expected) marginal totals; 

(3) simplicity of interpretation; 

(4) independence of computational errors; 

(5) rapidity of processing; 

(6) economy of effort; and 

(7) fully efficient criteria for testing the significance of departures from 

proportionality of rows and columns. 

Ease of extension to tables of higher order is a desirable property of the 
method of proportional distribution of marginal adjustments. Equation (8.1) 
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applies to the special case in which there are only two bases of classification. 
In the more general case sample observations are cross-classified according to 
r bases of classification, each cell frequency in an rth order table being the num- 
ber of observations in the corresponding rth order class whose expected value 
is to be estimated. The required adjustment for each first order class (obtained 
by subtracting the sample total from its known expected value) is distributed 
among the various cells in proportion to the universe totals of the corresponding 
(r — 1)th order classes to which the cells belong. The general process is il- 
lustrated by 


(8.4) mak = Nine + Di.d.in + Dis din t p.adi;., 


the formula for estimating the expected frequency in the general cell of a third 
order table. 

Exact agreement with marginal totals follows easily from the method of 
proportional distribution and can be established algebraically by summing the 
estimation equation by first order classes; e.g., summing equation (8.1) byrows 
and columns. In practice, discrepancies are always either errors of rounding 
or mistakes in computation; they are never due to lack of convergence of iterative 
processes as is often true in alternative methods of estimation. 

Although simplicity of interpretation is desirable in general, it is especially 
important when random sampling is an unrealistic abstraction. For example, 
the method of proportional distribution of marginal adjustments has been used 
to estimate the cell proportions in a two-way classification of incomes from known 
marginal proportions and a detailed cross classification at an earlier date. In 
this problem known shifts in income distributions made it evident that certain 
cells previously vacant should not have the zero proportions which would be 
estimated for them by other available methods of estimation. The ease with 
which the effects of the method of adjustment can be traced is important also 
in the analysis of the results of sample surveys in which various types of bias 
are important. 

The method of proportional distribution of marginal adjustments yields the 
estimated expected frequency for any cell by a single sequence of computations 
which is independent of the corresponding process for any other cell. Errors 
made in computing the estimate for any cell appear in marginal totals of esti- 
mates for all first order classes which include that cell. If only a few errors are 
made in a table they can be localized immediately and can be corrected without 
recomputing any estimates which are correct. 

In certain types of social surveys, rapidity of processing is so important that, 
as Deming puts it, “the delay of only the brief time required for adjustment 
may not be advisable.” [1, p. 102]. Under these conditions, it is important to 
have a simple formula like equation (8.1) in which substitutions can be made 
rapidly. Even when the time element is relatively unimportant, the economy 
of effort and the ease of explaining the method to clerical assistants are often 
of practical importance. 












252 JOHN H. SMITH 






Finally, departures from proportionality among rows and columns often 
provide the chief element of interest in research studies—not only in social 
surveys of the type illustrated in Deming and Stephan’s example but also ip 
biological sciences. The most effective tests of significance for the purpose of 
presenting statistical evidence of lack of proportionality are those based on 
statistics like those derived by the method of proportional distribution of marginal] 
adjustments whose efficiency is 100 per cent when proportionality is exact. 
Even when proportionality is not exact, the efficiency of statistics derived 
by proportional distribution may be close to 100 per cent under fairly typical 
problem conditions such as those in the example by Deming and Stephan wherein 
the other more complicated methods require several times as much computational 
effort, but have little advantage over the easier method with respect to effi- 
ciency of estimation in this particular problem. 


9. Suggested improvements in techniques. In section 7, a method was 
outlined by which it is possible to derive sets of maximum likelihood statistics 
by merely integrating available techniques without changing any of them. 
In this section a number of improvements are suggested. At this point it should 
be emphasized that a given change is not an improvement merely because it 
yields slightly more accurate estimates or makes possible a slight saving of 
time and effort. In each case the research worker should consider saving of time 
and effort and accuracy of estimation simultaneously. In particular, it seems 
likely that most social surveys of the type considered by Deming and Stephan 
are characterized by approximate proportionality by rows and by columns— 
conditions relatively favorable to the simple method of proportional distribu- 
tion of marginal adjustments. It should be clearly understood that sug- 
gestions in this section are intended for those research workers whose problems 
justify a great deal more effort than is required to adjust sample frequencies 
by this simple method. 

Assuming that the problem at hand warrants the effort required to derive 
maximum likelihood estimates, the first consideration is the derivation of a 
set of m,,(1), first approximations to the m;;, and a set of values of p,(1), 
first approximations to the p;. Evenif proportionality by rows and by columns 
is not closely approximated use of values of the p;(1) provided by equation (8.8) 
are especially to be recommended. In the example used by Deming these 
values for the p,(1) are so much better than the values recommended by Deming 
that they save a large proportion of the effort required by the iterative process. 
If rows and columns are approximately proportional, equation (8.1) should be 
used to provide values of the m;,(1), in which ease it is possible to use an itera- 
tive process similar to the one used by Deming but based on the typical equa- 
tion of maximum likelihood (7.3) to achieve a given degree of accuracy in the 
maximum likelihood estimates with even less effort. Underfavorable conditions 
such as those in Deming’s example the suggested iterative process yields excellent 
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approximations to maximum likelihood estimates by means of the following 
steps: 

1. Construct a set of first approximations to the r row components of the rs 
maximum likelihood divisors (a; + b;) by means of the equation 


(9.1) a(1) = n;./np;, — 1/2. 


2. Compute successive approximations to the a; and b; by means of the equa- 
tions 


(9.2) bg) = [n.; — Zmi(1)a(g)]/np.;, 
(9.3) a(g + 1) = [ni. — 2m; ;(1)bj(g)]/npi. , 


where m;,;(1), the first approximation to m;;, is derived by means of equation 
(8.1). Just as in Deming’s iterative process, the expression in brackets is a 
series of products which can be subtracted in a single sequence of machine 
operations and the final division can be performed without having to record 
any of the intermediate results. 

3. Divide the sample frequencies by the maximum likelihood divisors to obtain 
the maximum likelihood estimates 


(9.4) mij = ni;/(ai + b,j), 


where limiting values of a; and b; are approximated as closely as desired by 
successive approximations in the preceding equations. 

Under unfavorable conditions, the iterative process of this section is not 
always the easiest way to obtain satisfactory estimates. For example, when 
samples are small and/or rows and columns are not approximately proportional, 
it is better to use the iterative method as originally presented by Stephan where 
sample frequencies can be used for first approximations to the c;; and these may 
be replaced by successively better approximations. 

The point made in the final paragraph of Fisher’s well-known book [3] that 
‘in practice one need seldom do more than solve, at least to a good approxima- 
tion, the equation of maximum likelihood,” is strongly supported by the develop- 
ments of this article. In addition, the proof that the method of least squares 
and the method of minimum chi-square always lead (by means of approxima- 
tions to ideal weights) to maximum likelihood statistics greatly facilitates the 
adaptation of techniques developed in connection with these hitherto competing 
methods. 
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A STATISTICAL PROBLEM CONNECTED WITH THE COUNTING OF 
RADIOACTIVE PARTICLES 


By Stren MAtmaquistr 
Institute of Statistics, University of Upsala, Sweden 


1. Introduction. Our problem refers to random events forming a sequence 
in time or in space, e.g. particles emitted by a radioactive matter. By omitting 
certain elements of the given sequence, say f, we form another sequence, say g. 
The rule of omission involves an arbitrarily prescribed constant u. The rule 
to be followed in forming gq is: 

Case I: Let a be an element in f and g. The next element to be included 
in g is then the first element in f which follows a after a distance greater than w. 

Case IT: Let a be an element in f and g. The next element to be included in 
gis then the first element in f which follows a at a distance greater than wu from 
the preceding element in f, whether this belongs to g or not. 

When the events are represented by impulses emitted by a radioactive matter 
and feeding a recorder with a constant resolving time u, the new sequence con- 
sists of the counted impulses. The two cases correspond to the reaction of 
different types of recorders. The distinction between the two transformations 
has caused some confusion. It has, however, been clearly pointed out by 
Ruark and Brammer [5]. 

v. Bortkiewicz [2] seems to be the first who has considered problems related 
to the transformed sequence. Starting from investigations by Rutherford, 
Geiger, and others, concerning the number of recorded a-particles during a 
certain interval of time, say 7’, he observed that the distribution of this number 
was similar to that of Poisson but with a slightly smaller dispersion. This fact 
he supposed to be caused by a constant resolving time u of the recorder. By 
means of certain assumptions he tried to calculate the effect on the mean and 
the dispersion by the transformation in Case I, supposing the cumulative dis- 
tribution function F(t) for the distance between two consecutive elements in 
the sequence f is given by 

F(t) =1—e” 


> 


where here and in what follows, ¢ denotes a non-negative variable. 

Considering Case II with F(t) as above, Levert and Scheen [4] have recently 
worked out an expression for the distribution of the number of elements during 
T in the sequence g. 

Gnedenko [3] has considered the distribution of the number of lost elements 
in Case I with particular regard to the initial state of rest. 

Alaoglu and Smith [1] considered problems referring to successive trans- 
formations of a sequence. When, for example, a sequence of particles enters 
a tube-counter and amplifier, together acting with a resolving time wm, and 
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the impulses then are feeding a recorder with resclving time uw. > wu, the ge. 
quence of recorded impulses will be the result of two successive transformations. 
If we have a scaling circuit between the counter and the recorder, we have to 
make a transformation of another type between the two transformations jn 
Case I and Case IT. 

The present paper deals with the transformed sequence in Case I. The 
distribution function F(é) is supposed to be arbitrary. An advantage of this 
generalization is that the formulas derived could be used in treating problems 
referring to successive transformations. 

The author wishes to express his sincere gratitude to Professor Herman Wold 
for stimulating discussions and valuable advice. 


2. Derivation of distributions for case I. Suppose that the sequence f 
has F(t) for distribution function for the distance between two consecutive 
elements. F(t) is supposed to be independent of absolute time (space), and of 
the preceding distance between two elements. When not stated otherwise, 
we further suppose F(0) = 0. 

Now let G(¢) be the distribution function for the distance between two con- 
secutive elements in the transformed sequence g. Evidently G(¢) also is inde- 
pendent of absolute time and of the preceding distance between two elements. 

We shall consider certain distribution functions connected with F(t). These 
functions will then be used in solving problems concerning the sequence g. 

Let F,(t) be the distribution function for the distance between the first and 
the last of n + 1 consecutive elements in the sequence f. Then F,(¢) is given 
by the recursive system 


Fein(t) = | Fa(t — x) dF,(2); ss 
(1) 0 


Fi) = F(t). 









As is easily seen, we have 






Fnin(t) o F(t) -F.(d); 








and, fort = u, 





F,(uy — 0, asn —> ©; 







00 


F,(u) < ~, provided that F,(0) <1. 
n=1 
Alternatively, /,(t) could be deduced by the use of characteristic functions. 
Still considering the sequence f, let &(¢) be the distribution function for the 
distance d between an arbitrarily chosen point and the following element. 
Suppose that the arbitrary point is chosen so that the distance between the pre- 
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ceding and the following element is x. Under this condition we have, in usual 
symbols, 


rseae 22.. 


ee [* = aH (2) 


where H(t) is the distribution function for the distance zx. 
To deduce H(t) we suppose that the distribution F(t) has a finite mean, 


m= [ t dF (t). 
0 
By the definition of H(t), we then have 


Ht) « {| aro. 


t 2 
(2) a(t) = 1 [ sar@ +t ar(z) | 
: m0 t 
The corresponding frequency function ¢g(¢) is given by 


of) = FO. 


Consider n + 2 consecutive elements in f, say a, @1, --- , Q@n4i1, Where dp 
is an element in the transformed sequence g. The probability P, that the 
next element in g following a» will be an4: is given by 


| = F,,(u) — Fasi(u), 
P o=l1l- F (u). 
Now let P,(t) be the probability that the distance between ap and ay4; is 


smaller than or equal to ¢, when dp an a,4; are two consecutive elements in the 
sequence g. Then 
l u 
A nance fi Fi —2z) —F(u - F,,(x), 
Pi) = Raa Faw | [PO- 2) - Fw - 2) | are 
oe _ F@ — Flu) 
(n — 1, 2 ), Po(t) — “1— Fl) * 
Let G*(t) be defined by 


Gt) = > P,- P,(t) = FO — Fu) 


n=0 


+> [ We -2) — Fw — marco); t> wu. 
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When G*(é) is a distribution function, then G*(t) equals G(é). 





















For t; < & we obviously have G*(t,) < G*(t). M: 
Fort = 
G*(2) =1— Fw) + & [Pu — 2) aPy(x) 
n=l 0 
=1—F(u) + DFa(u) — Do Fayw(u) = 1. 
1 1 
Hence we take 
(4) G(t) = G*(0); t > By 
G(t) = 0; t<u. 
When the corresponding frequency functions g(t) and f(t) exist, we get Le 
— Sup 
(5) a) = f0 + X [ se - 2p) ar; ioe 
W 
Dealing with a sequence of elements we are often concerned with the number 
of occurrences during a certain time 7’. (7) 
Let the mean number of occurrences during T be M(T). Supposing that 
the mean m = [ t dF (t) is finite and that F(0) < 1, we have Ii 
9 of t 
(6) M(T) = T/m. wit! 
. ; fort 
We define the 
Kilt F(t) fort >, 
-) = 
Xd) 0 fort <.¢ 3 
nov 
K.(t) = sa ees que 
F(e) fort <e tan 
] 


and denote the corresponding means by M,(T) and M,(T). As is easily seen, 
M,(e) < M (e) - M.(e). 
Using (2), 
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Making N = T’/e and summing, we obtain 


T T 
M\(T) = me. 


/ iis aa I x dF(x) + €F(e) 
0 0 





; T T 
MT) = =———- =- -——— 


| xdKk(x) ™m -| x dF (x) 
0 o 


By choosing ¢ arbitrarily small, we get 
M(T) — T/m. 


Let P(n, T) be the probability that we get n elements in f during a time T. 
Suppose that the first of these elements, a; , comes at 7) + 2, and the last, 
a,,atTo + x+y. 

We then have 


- Ts 
(7) pin, 1) = | o@) de [1 — Fr — 2 — y)) dF ea). 


In (4) and (7) we have equations for the transformation in Case I. Because 
of the general form of F(t), the formulas also can be used when we are concerned 
with successive transformations. It can further be remarked that the trans- 
formation of a sequence of impulses by passing a scaling circuit is expressed by 
the system (1). 


3. Results for a particular form for F (¢). The preceding formulas will 
now be used for a special distribution function F(t). Suppose that the fre- 
quency function f(t) = dF(t)/dt is equal to the frequency function of the dis- 
tance between an arbitrary point and the following element. 

From (3) we get 


= ) 
F’(t) = 1 — Fi) > 
m 
or, when F(O) = 0, 


(8) Fi) =1—-—e%; 
(9) f(t) = ae, where 1/a = m = [ tf(t) dt. 
0 
By means of the theory of characteristic functions we have 


(10 fl) = 5- | @re™* ax; fi) = $5 


where 


(11) ta) af ent ott gt = 
0 


a 
a-—ia 
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Thus 
1 +00 n 


(12) , fa(t) = 7 S po 


——- dz 
co (a — ix)” 
For n = 1, we get 


a Re i [- a —ttz 
(13) fil) = ae so dx 


By differentiating (13) n — 1 times with respect to a we obtain 


a 1 ' cs 1 , ,- —ittz 
Hence, from (12), 


n 


e. a n—1 —at 
(14) fr(t) = i. Di! , 
From (5) we obtain the frequency function for the transformed sequence g 


—at = ™ —at a” n—1 au —at 
= ae" + [sammie i =aée oe; t > 
(15) a) x | - (n — 1)! eas <4 


G(t) = 0; t < u. 
The mean m, is given by 


ee er 
a 


Remark: Suppose the constant u is allowed to vary independently of ¢ and 
that the frequency function of u is y(u), we obtain 


Mg [ t at | g(u, thy(u) du = [ : y(u) du + [ uy(u) du 
(16) 


: + m(u). 
a 


Now let the sequence of elements, g, by means of (5) be transformed into a 
new sequence, h. When we are concerned with the counting of particles, 
emitted from a radioactive matter, let the sequence g consist of impulses from 
a counter-amplifier with resolving time u, feeding a recorder with resolving 
time u,. Then the elements in h are the counted impulses, it being supposed 
that the tube-counter and the recorder reacts according to the assumptions. 

We suppose uw, > wu. When uw < u, the sequences g and h are identical. 

Let gn(t) denote the frequency function of the distance between the first and 
the last of m + 1 consecutive elements in g. We find, in the same way as 
used in obtaining (14), 


n 
a 


(17) gn(t) = (n— 1)! 


en“ (t salt nu)” ". 
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Let h(t) be the frequency function for the distance between two consecutive 


elements in the sequence h. Let further N be the greatest integer smaller 


than or equal to w/w. 
Using (4) and (5) we obtain 
hi(t) = ae“ e™ a — , (us — nu)" e""; t>umtu; 


(18)  hir(é) f"¢? = — “tl — (n+ 1)ul" ec, (N+1lu<st<um+u; 


hin(t) = ae“e > : [é — (n+ 1)ul" e*™, uwy<t<(N+4+ 1)u. 
0 e 


The mean m, is found to be 


(19) m = |: + v|[a+ = = (um — = pent 


n=l v=n 
We also have 
| thy(t) dt << m < thy(t) dt 


uytu 


1 ~ af a 
~-+tut+u oo — (% — nu)"eo"™ 
a on 
1 N a” 
<m <|—-+ % |e” b> — (% — nu)*e*™ |. 
a Lo n! 


We now consider the number of occurrences during a time interval 7’. Using 
(6), (16), and (19) we immediately get the mean numbers of occurrences during 7’. 
By (3), we get for the sequence g 
a 
m—mngs £¢ 
au+ 1’ = 
(20) ¢o(t) = a 
i t>u. 
au+ 1 : ™ 
Inserting (20), (15) and (14) in (7) and evaluating the integrals, we finally get 


Ou-1 — 2an + Gap; n<=—1 


_ aT z. 
— > n ) <n 


aT 
a. lewrn- ail 


‘Sne + 1. 


<i 
"/ 


(21) P,(n, T) = 
An-1 2| n — 
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where 
a —a(T—nu) . (T oe nu)’ a” = 
a . = (n — v), (n = 0,1,--- 
(22) ono v! ) 


a_-1 


When u = we obtain 


n 7” a’ 
a, =e" >) — (n — »). 


v=0 v! 


For the sequence f we then get the Poisson distribution 


(23) Piaf) = SL eg. 


n! 


The corresponding expression for the sequence h is much more complicated. 


4. A statistical experiment. The following statistical experiment will serve 
as an illustration of the scheme dealt with in this paper—the transformation of 
a sequence and the resulting formulas, especially (21). 

Groups of five figures, the last rounded up if necessary, have been extracted 
from tables of random sampling numbers (6). Let each group denote the first 
five digits for a decimal x, arbitrarily chosen between 0 and 1. The variable 
x is supposed to have the distribution function ¢for0 < ¢< 1. We now define 
a new variable, y, given by 


(24) y = —klog(1 — 2), for y = —kloga}. 


The variable y has the distribution function given by (8), v7z. 


F(t) = 1 — e *, where as m = kloge. 
a 


Transforming each group, or number 2, according to (24), we get a sample of 
consecutive distances between elements in the sequence f considered in the 
previous sections. Choosing a constant wu, we can construct the corresponding 
sequence g. Beginning with a point, arbitrarily chosen on the first distance, 
we can finally count the number of elements in successive intervals of the same 
length. 

Take k = 1,u = 0.2 and 7 = 1.5. We then have for the sequences f and g: 
log e = 0.4343; M, = 


my = + u = 0.63438; 


0.4343; = 0.4343; 


os 


Rie Aine 


7 


< Qi=_ Qi 


| 


M,(T) = = 3.454. 


3 
S|8 
Il 
bo 
oO 
o> 
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The experiment yielded the following results: 
For the sequence f: For the sequence g: 
Number of elements 801. Number of elements 555. 
my = 0.450. mM, = 0.648. 


In neither case is the deviation between the observed and theoretical means 
statistically significant. In fact we have: 


(my a m;)V/ 800 jis 1.0; (Mg oe mg) V 554 Mig 0.8 
Of Og 


which gives P = 0.3 and P = 0.4, respectively. 


TABLE 1 


Nos. of wntervals with n elements 


Sequence f | Sequence g 








| Expected | | Expected | Expected 
Observed | according | Observed | according according 
| to (23) to (21) to (23) 


7S Ot | gs | 2.7 
26.1 | | 42.5 54.8 
45.1 | 81.8 63.3 
51.9 =. * 48.8 
44. | ' 28.1 
31. | 5 | 13.0 


| wy. | ; 5.0 
| . 2.4 





+>) 
— 


ee 





Mean | | 454 


| 
= 





825 | | 4.524 








68 | 0.34 | <0.001 


The functions a, in (22) can be calculated by means of Pearson’s tables of 
the incomplete y-function (7). In the notation of these tables we obtain 


d 
pose - 2) = (9, 0). 


Hence 


— n—X 
Ci OO 
n! aut+1 (p, I 
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where 
r 
o* i a q=n — 2. 


In the present case, however, we only need the numbersup toa;. Accordingly, 
the a, have been calculated directly. 

The resulting theoretical and observed distributions for the number of ele- 
ments during 7’ for the sequences f and g will be found in Table I. For com- 
parison, a Poisson distribution, with the same mean as observed for the sequence 
gi is given. The result of a x” test is also shown in Table I. Judged by the x’ 
test the distributions (23) and (21) agree fairly well with the observed distri- 
butions. As was to be expected, the Poisson distribution cannot be used for 
the sequence g. 


= a(T — nu); 
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THE PROBABILITY FUNCTION OF THE PRODUCT OF TWO NORMALLY 
DISTRIBUTED VARIABLES’ 


By Leo A. AROIAN 
Hunter College 


1. Introduction and. summary. Let x and y follow a normal bivariate prob- 
ability function with means X, Y, standard deviations 0; , 02, respectively, r 
the coefficient of correlation, and p) = X/o1, ps = Y/oe. Professor C. C. 
Craig [1] has found the probability function of z = xy/o.c2 in closed form as 
the difference of two integrals. For purposes of numerical computation he has 
expanded this result in an infinite series involving powers of z, p: , p2 , and Bessel 
functions of a certain type; in addition, he has determined the moments, semin- 
variants, and the moment generating function of z. However for p; and p: 
large, as Craig points out, the series expansion converges very slowly. Even 
for p; and p2 as small as 2, the expansion is unwieldy. We shall show that as 
p, and pp — ~, the probability function of z approaches a normal curve and in 
case r = 0 the Type III function and the Gram-Charlier Type A series are excel- 
lent approximations to the z distribution in the proper region. Numerical in- 
tegration provides a substitute for the infinite series wherever the exact values of 
the probability function of z are needed. Some extensions of the main theorem 
are given in section 5 and a practical problem involving the probability function 
of z is solved. 


2. Theorems on approach to normality. The moment generating function 
of z, M.(@), is [1] 
exp (PL + p2 — 2rprpe)O" + 2ps ox 
,  2f1-(1+ rnej[1 + (1 — r)6} 
(2.1) M,(0) = — aaa —— ——, 
vil — (+ nell + (1 — 76 








Let 2, and o, be the mean and the standard deviation of z, and t, = (z — 2)/c,. 
Now 





(2.2) 2 = pipe + 7, oe = Vpi + pit 2rpipe +14 Pr. 
Using (2.2) we find in the usual way the moment generating function of ¢, 


exp Zw (ot + pe + 2rppa)eo’ + Ar"! — Qw'(r" — 1)(orm + 7) 
(23) M,, = ae a = OF ll + T= 
‘ Vfl — (1 + null + 1 — nu] 


’ 


where w = 6/o-. 


1 Presented to the American Mathematical Society, Oct. 28, 1944, New York City. 
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Consider r 20. Then in the limit as p; and p, — © in any manner whatever, 

(2.4) lim M,,() = e””, 
P1p2—-°2 
and by the theorem of Curtiss [2] on moment generating functions we see in 
the limit as p; , pp — ~*~ the probability function of z approaches a normal curve 
with mean, 2, and variance o~ , r = 0. 
In case —-1 +e <r <0,€ > 0, some care is required wherever 

V pt + ph + 2p per 
oceurs. If one uses pi + p: = 2pip2, the proof goes forward quite readily, 
Hence we have proved the theorem: 

THEOREM (2.5). The distribution of z approaches normality with mean 2, 
and variance o, as p, and p, — « in any manner whatever, —1 + € <r <1, 
e> 0. 

It is evident in Theorem (2.5) we may allow p: , pp — — © without any other 
changes. Theorems (2.6) and (2.7) are proved in essentially the same way 
as (2.5). 

THEOREM (2.6). The distribution of z approaches normality with mean 2, 
and variance o; , if pi — ©, px —%, -l1Sr<1l—ee>QO. 

THEOREM (2.7). The distribution of z approaches normality if p; remains 
constant pp ~, -l1+e<rZil,e > 0; orif pi: remains constant pp - — ~, 
-lsr<l-—ee>0O. 

Naturally in any of the theorems p; and p. may be interchanged. In praetice 
pi and p. are usually positive. The approach to normality is more rapid if 
both p; and p2 have the same sign as r. 


3. Numerical values. In order to show how closely the Type III and the 
Gram-Charlier Type A series approximate the probability function of z, f(z), 
or more precisely f(z, pi , p2 , ”), we use numerical integration where 


S(é, Pl, P2, r) — 1,(z) a T(z), 
! e 1 
he) = seal ew ~ gq tye {@— wn - Bree - on(2 - wr) 


Zz *\ dx 
+E - “)}S 


and /,(z) is the integral of the same function over (— «, 0), [1]. Now J;(z) 
may be written as 


(3.1) 


1 © ' 
3.2) h@ ==] ewewsu) =, 


where 


—(¢2/2) 


t iiss ma lo 
g(t) Je 


B(ts) = e*, ts = rhh. 
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We readily obtain /,(z) 1/1 — r? by forming the product of g(t), o(), B(ts), 
and 1/x using numerical integration applying Weddle’s formula, the Gregory- 
Newton formula, or the simple rectangular formula depending on circumstances. 
The rectangular formula [38] is remarkably accurate when the function T = 
g(tie(t)B(ts)/x in the interval 0 to « or 0 to — is somewhat symmetrical. 
Appropriate tables for g(t:), g(t2) (see [4]), B(ts) (see [5]) and 1/zx (see [6]) are 
readily available. In the important case of the independence of x and y, r = 0 
and (3.2) becomes 


eo d 
3) he=] ede) S, “e-n eee~s. 


4. Approximations to f(z). When r 0, the standard seminvariants §, 
and & of z are 


(4.1) i. = 612i + 6) + 1) 
. (oi + ps + 1)*”” Gist 


remembering 
- 2 2 
Z = pipe, oz = Va + a + 1. 


In the Pearson system (see [7]) 5, the criterion, is 


_ 2k — 3&5 
(4.2) 6 = 6a E, 


and for the probability function of z 


| _ (pi + p2 + 1){2(or + 2) + 1} — 18pi pr 
(4.3) 7- 2 2 T72 . 2 2 2 

(pr: + p2 + 1)[(or + p2 + 1)° + 2(pr + p2) + 1) 
and if p, 


_ - 2(4p' + 1) (2p? + 1) — 189" 
(4.4) ™ of 4 1a i! a ata 
(2p” + 1)[(2p? + 1)? + (497 + 1)] 


Now 6 = 0, & #¥ O, for the Type III function, and clearly lim 6 = 0. 
P1p2—-2 
By use of (3.3) the accurate values of f(z) have been calculated for various com- 
binations of p; and p, and compared with the Type III approximation using 2, 
Oz, §3 . 

(4.5) Investigations so far completed show that for p: 2 4 and p. = 4 simul- 
taneously, and |6| < .008, the Type III approximation will provide values 
of t, correct to three significant figures at least where 

+) 


(4.6) | " ft) = a, / is f(z) =a, and 05S aX .005. 


These are the values of ¢, which would be needed in testing hypotheses. The 
exact values of ¢{” and for ¢® for various values of p; and p less than 4 will be 
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determined it is hoped in the future and will be published along with the com- 
parisons of the Type III values of ¢, with the accurate values of ¢, in the im- 
portant borderline cases of p) = p2 = 2, and p; = pp = 3. The values of f(z) 
for pi: = po = 2 and p, = po = 4 have been calculated but these are being with- 
held for a more complete table. The table of values of Z, o., &, &, and 6 
(Table II) shows then that the Type III function is excellent along a band about 
pi = pe, since ; ¥ O, and 6 is very small. 

We use the Gram-Charlier Type A series of three terms to approximate the 
probability function of z in ¢, units. 


(4.7) Hl) ~ ot) — BF 9 @ + 8 00, 


in the usual notation. 


—_ - a 


TABLE I 
suede pe 
i f(t2) Correct value | Normal Curve 


eee Sit sai cat 
.9950372 | . 2406367 | 2431716 
4925558 | . 1275209 | . 130970 
. 9900744 | 0538243 | .0550708 .053704 
4875930 | 0184606 | .0180791 | 0184500 
9851116 | 0052477 | 0046338 | 0052944 
4826302 | 0012609 | .0009272 | 0012804 
9801488 | 000261 1 | 0001449 .000260 
.4776674 0000467 | 0000177 0000425 
.9751860 | .00000745 | .00000168 .00000555 


Gram-Charlier 
Type A 


| 

ce 
2408235 
| 127484 


1 
] 
2 
2 
3 
3 
+ 
+ 


| 
| 


(4.8) For | é| < .6 and & < .4 simultaneously the Gram-Charlier Type A 
series is quite adequate for finding probability levels such as those of (4.6). 
These will in general give 3 significant figures for © or ¢”. In the special case 
pi = 0, ps = 10, the Gram-Charlier Type A series differs from f(t.) very slightly 
in the range 1 S | t.| < © (see Table 1). Naturally the Gram-Charlier will 
be used wherever Type III is not indicated, although there exist some over- 
lapping regions where either one may be used. It should be noticed that the 
approach of f(z) to normality is more rapid along a row than down a diagonal. 
In case either p; Or p2 is negative, we may make use of the equation 


(4.9) f(z, —~P1»y P2, r) a fi-s, Pl, pr; ~f). 


We note that when r = 0, f(z, p: , p2) always possesses a discontinuity at z = 0, 
(see [1]). A table of 2, 0, , & , &, and 6 is provided for values of p; and p» from 
0 to 10 inclusive. 
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TABLE II* 


6 





0 0 
2.236068 4.123106 6.082762 . 062258 .049876 
0 0 
2.160 .685121 319942 | .183195 . 118224 
.529 .205 .101 .059 .039 








4 | 

3 | 4.582576 | 6.403124 | 8.306624 246951 

8 498784 | .274256 | .167493 | 111531 

1.259259 .557823 | = .289114 172653 | 113742 
056 | .056 .042 .031 








16. | 
5.744563 280110 | 9. | 10.816654 
. 506408 .373206 | — .263374 . 189641 
| .358127 .224279 | 147234 . 102126 
| —.0084 | .0049 | .014 .016 











| 36. 3. 
8.544004 | 10.049876 .704700 
.346314 | .28873. | «224503 
. 163258 .118224 | 087272 
| —.0054 | —.00083 | 0038 
.357817 | 12.845233 
262088 226472 
.092663 .072507 
.0034 — .0015 














100. 
14. 177447 
210551 
059553 
— .0023 








* The first value in a cell is Z, the second o3, the third & , the fourth & , the 
fifth 6. 
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5. Some extensions. We may generalize our results to any case where z 
and y are distributed approximately in a normal distribution such as the dis- 
tribution of the product of two means, when the sizes of the samples N, and N, 
are large and consequently p; and p: will be large. Another example occurs if 
x and y each follows a Bernoulloi probability function with parameters p; and 
pz respectively where the number of trials in each case is large. We must warn 
the reader that the condition p,; — ~, p,-> « alone does not mean that the dis- 
tribution of z approaches normality. Both x and y must be distributed normally. 

The actual problem which gave rise to this investigation was the question 
of determining the sum of a great many variates [8]. Let 7 variates v7, , 1, 
-++, Up be given whose sum A = > 7.1 0; is desired. Clearly 

- 

A=TV,,V,= a v;/T. 

= 
Now let us estimate A by A = 7,V, where 7, is an estimate of T and J, is an 
estimate of V,. If ox is very small, p, = T/oz, will be large and p, = V,/o5, 
= /NVJ,/c, will be very large. Assuming 7’, is distributed normally and 
obviously V, is distributed normally for N large, we see by the theorems of this 
paper that A will be distributed normally. Confidence limits for A may be 
calculated in the usual fashion as A + yox, Where y is determined by 


| g(t)dt = a, 
= 


with a generally chosen as .025 or less and 
ae \/ T* 6? lV? o2 2 2 
3 = 2 OF + V, oF + Oj; OF . 


Stratification is also possible. It is interesting to note that many functions which 
occur in life insurance are products. Such applications will be treated fully 
elsewhere. Naturally the critical region whether both tails or one tail of the 
distribution should be used depends on the alternatives to the hypothesis being 
tested. 

Generalizations of the main theorem are possible for the probability function 
of z = [][%_1 2; where x, x2, ---, x, follow a multivariate normal probability 
function. These will be investigated in a later paper. It may be noted that 
J. B.S. Haldane has investigated the distribution of a product along different 
lines [9]. 
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NOTES 


This section is devoted to brief research and expository articles on methodology 
and other short items. 
-_ . a ~ : : . 
A REMARK ON CHARACTERISTIC FUNCTIONS 
By -A. ZyGmMuND os 
University of Pennsylvania 
1. Let F(x), —~ <2 < +, be a distribution function, and 


+00 
g(t) = | e* dF (zx) 


its characteristic function. It is well known that the existence of ¢’(0) does 
not imply the existence of the absolute moment 


(1) | fo x | dF (2). 


A simple example is provided by the function 


; cos nt 
“hh = ¢ 2 


 n? log n’ 

where C is a positive constant. Since the series on the right differentiated term 
by term converges uniformly (see [1]), ¢’(¢) exists (and is continuous) for all 
values of ¢, and in particular at the point ¢ = 0. Obviously ¢(¢) is the char- 
acteristic function of the masses C/2n’ log n concentrated at the points -tn 
forn = 2,3, ---. The constant C is such that the sum of all the masses is 1. 
The divergence of the series 21/n log n implies that in this particular case the 
moment (1) is infinite. 

In a recent paper (see [2], esp. p. 120, footnote), Fortet raises the problem of 
whether the existence of g’(0) implies the existence of the first algebraic moment 


+00 x 
(2) [ 2ar@ = tim [ xarq). 
a X—+0 /-X 
The main purpose of this note is to show that this is so. We shall even prove 
a slightly more general result. 
A function y(é) defined in the neighborhood of a point f is said to be smooth 
at this point if 
lim 
h—+0 


lo + h) + Wo — h) — Wh) _ 
; | 


Clearly, if y has a one-sided derivative at the point &, the derivative on the 

other side also exists end has the same value. Thus the graph of y(é) has no 

angular point for ¢ = ¢) , and this explains the terminology. If y’(t)) exists and 

is finite, Y(t) is smooth for ¢ = &. The converse is obviously false, since any 
272 
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function whose graph is symmetric with respect to = & is smooth at that 
point. 

ToEoREM 1. If the characteristic function g(t) is smooth at the point 0, then 
a necessary and sufficient condition for the existence of ¢'(0) is the existence of the 
moment (2). The value of (2) is —ig’ (0). 

In particular, the existence and finiteness of ¢’(0) implies the existence of (2). 
That the converse is false, is obvious. For if a, a1, @2, --- are positive num- 
bers and a) + 2a; + 2a. + --- = 1, then Y(t) = a + 227 a, cos nt is the 
characteristic function of the distribution function F(x) corresponding to masses 
concentrated at the integer points --n and having the values a, there. Owing 
to the symmetry of the masses, the number (2) exists, and is zero even if ¢(t) 
is non-differentiable for ¢ = 0 (we may e.g. take for g(¢) the Weierstrass non- 
differentiable function C ty a” cos b"t, where C is a suitable constant). 

Proor. We may write 


ah I cos xt dGlx) + ¢ I "gin at dG(a) = y(t) + ta) 
where 
G(x) = F(a) — F(-—2z), H(x) = F(x) + F(—2). 
Thus 
(3) 0 <|AH| < AG. 
Since g(t) is smooth at the point 0, and since y(¢) is even, W2(t) odd, 


0 = lim g(h) + o(—h) — 29(0) yilh) — ¥(0) 
h 


= 2h 
h—-+0 h - 





—2 im [ I = 008 BF acre) 
h—>+0 Jo h 


so that, replacing h by 2h, 


om. 2 
[ = hee dG(x) — 0 ash— 0. 
0 


Since the integrand is positive we obtain successively 


Wh +2 
| =—— dG(x) = o(1), 


(? ) 
yal — he 
I “_- dG(x) = o(1), 
0 h 


lh 
2? dG(x) = o(h7), 


Wh 
a dG(x) = o(h'), 
1/2h 
Wh 


dG(x) = o(h). 
1/2h 
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Since y;(¢) is even, the smoothness of g(é), and so also of y(é), at the point 
! = O implies that (0) exists and is zero. If h > +0, 


v2 (h) — (0) _ f° sin zh ee 


2 2/h 4/h 8/h 
(Bi) <n] jan | < 0( f ag + | ag +f ag +--+) 
. l/h I/h 2/h 4/h 


= h*o(h + h/2 +h/4+---) = o(i), 
by (3) and (5). Also 


Wh ih /-: : i/h 
A, — i oui sin he _ 1) xali ™ O(x?h)a dG 
0 0 


l/h 
=] O(a7h) dG = o(1), 
0 
by (3) and (4). Thus 


a Ih l/h 
= = 0(1) + | adH = o(1) + [2 dF, 


and so 


Wh 
eh) — 90) = o(1) + if x dF. 
h 1h 


It follows that the existence of (2) is equivalent to the existence of the right- 
hand side derivative of g(t) at the point ¢ = 0, or, on account of smoothness, 
to the existence of ¢’(0). Moreover, the value of (2) is —7g’(0). This com- 
pletes the proof of Theorem 1. 


2. Suppose that a function y(t) defined near the point & satisfies for h—- 0 
a relation 


W(to +h) = ag + ayph/LL+ +++ + aga V(k — 1)! + [ay + o(1)Jh*/k!, 


where ap, a1, °**, a are constants. Then a, is called the kth generalized de- 
rivative of y at the point &. It will be denoted by Yu(to). The existence 
and finiteness of y“ (tf) implies the existence of Y(t) and both numbers 
are equal. : 

Another generalization of higher derivatives is based on the consideration of 
the symmetric differences 


Aw(t) = vio + h) — W(t — hy), 
AW lo) = W(to + 2h) — Wt) + w(t — 2h), 
AiW(to) = W(lo + 3h) — 3¥(to + h) + 3(to — h) — W(to — 3h). 
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If AiW(to)/(2h)* tends to a limit as h > +0, this limit is called the kth sym- 
metric derivative of y at the point é&. We shall denote it by Dip(t). Clearly, 
Diw(to.) exists and equals ¥4)(é), if the latter number exists. 

It is a simple matter to prove (see [3]) that if k is a positive even integer, 
and if the characteristic function g(t) has at ¢ = 0 a finite symmetric derivative 


+00 
Dw(0), then the kth moment | a* dF (x) exists, and its value is (—1)*D,(0). 


+00 
Conversely, the existence of [ «dF (x) obviously implies (for k even) the 


existence and continuity of y(t) for all ¢, and in particular at the point ¢ = 0. 

In order to obtain an extension of Theorem 1 to the case of derivatives of 
odd order, we have to generalize the notion of smoothness. We shall say that 
afunction y(t) satisfies for ¢ = é condition S, , (k = 1, 2, ---), if 


An W(t.) = o(h") as h — +0. 

For k = 1, condition S; is identical with smoothness at f. Clearly, if Wu, (to) 
exists, Y satisfies condition S, at f . 

THEOREM 2. Suppose that k is a positive odd integer, and let y(t) be the char- 

acteristic function of a distribution function F(x). If ¢ satisfies condition S,, 


at the point 0, a necessary and sufficient condition for the existence of Dye(0) is 
the existence of the symmetric moment 


0 x 
(6) | a dF(x) = lim | a* dF (zx) 
— 20 X—+00 /-X 


whose value is then equal to i *Dige(0). In particular, the existence. of ¢i)(0) 
implies that of (6). 

The proof of Theorem 2 is analogous to that of Theorem 1. Let G(x) and 
H(x) have the same meaning as before. Since k + 1 is even, condition S, 
at the point ¢ = 0 gives 


oo 
att 50) = i (e'** _ et)! OP (x) a "(7 | (sin xh)*} dF (zx) 


_ meen (sin xh)*** dG(x) = o(h'), 
0 


so that 


ih 
| (sin xh)*** dG(xz) = o(h*) 
0 


l/h 
| a dG(x) = o(h7) 
0 


W/h 


dG(x) = o(h*). 
1/2h 
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other hand, 


k fo / “bY & o/s LV 
ce Ane) _ fo (sin thy + ara) = [ (80%) ot aH 
(2h)* Lo \ 0 \ ah 


l/h co 
| ! aie hm. 
0 l/h 


i 2/h 4/h 
- G ik iy eo 
<h I, dG(z) = h LU, * i bd | 
k 1 
i wo) rs o(3) +... | 


by (8). Since 


=) = {1+ O(uv’)}* = {1+ 0@m)}* = 1+ 0W 


for small u, we immediately obtain 


1/h 


1/h 
A, — a‘ dH(x) = O(ha**") dG(x) = o(1), 
0 0 
by (7). Collecting the results, we see that 
k l/h k l/h 
—~ An¢(0) I k —z~ An ¢(0) t b op 
_— x dH = —~ _ x dF(x) = o(1 

a Lanes 4 v dH (x) t (hye a” dF (x) o(1), 


which completes the proof of Theorem 2. 
One more remark. By Theorem 2, the existence of the first moment is equiv- 
alent to the existence of the first symmetric derivative 


Daye(O) = lima [o(h) — 9(—hA)]/2h. 


In Theorem 1 we have a corresponding result for ordinary first derivative 
¢g'(0) = limisole(h) — ¢9(0)]/h. 


There is no discrepancy here since at every point where ¢ is smooth the two no- 
tions of derivative are equivalent. 
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A LOWER BOUND FOR THE VARIANCE OF SOME UNBIASED 
SEQUENTIAL ESTIMATES 


By D. BLAcKWELL AND M. A. GIRSHICK 


Howard University and Bureau of the Census 


Consider a sequence of independent chance variables 2, , x2 ,- - - with identical 
distributions determined by an unknown parameter 0. We assume that E 2; = 6 
and that Wi = 1. + --- + a is a sufficient statistic for estimating 6 from 
t1,°** » %. A sequential sampling procedure is defined by a sequence of 
mutually exclusive events S; such that S, depends only on 2, ---, 2 and 
>P(S.,) = 1. Define W = W,andn = k when S, occurs. Ina previous paper 
by one of the authors [1] it was shown that if S, = Wi.C(S: + --- + Sia), 
(where C(A) denotes the event that A does not occur), the function V(W, n) = 
E(x, | W, n) is an unbiased estimate of 0, and o(V) < o°(2;). It is the purpose 
of this note to obtain a lower bound for o(V). Our result is: 

a(x 1) 
E(n) ° 

We remark that the lower bound is actually attained in the classical case of 
samples of constant size N. For in this case, (see [1]), V = E(a:| Wy) = Wy/N. 
In fact we shall show that in a sense this is the only case in which the lower bound 
is attained. 

The proof of Theorem I depends on certain properties of sums of independent 
chance variables. These, formulated more generally than is required for the 
proof of Theorem I, are given in 

THEOREM I. Let 2, 2X2 , --+ be independent chance variables with identical 
distributions, having mean @ and variance o°(2;). Let furthermore {S,} be any 
sequential test for which E(n) is finite. Let W = 2, + --: + a when n = k. 
Then 

(a) o(W — On) < o° (x1) E(n). 

(b) If o°(n) is finite, the equality sign holds in (a). 

(ec) Elai(W — @n)] = o (a). 

Proor oF (a). Write y; = z; — 6, and define Y = y; + --- + y, when 
n=k. By definition, 


THEOREM I. o (V) > 


(1) o(W — 6n) = >» | (y, + --- + y)* aP. 
k=1 “YS, 


To prove (a), we must verify that the series on the right of expression (1) con- 
verges and has sum <o'(a,)E(n). Now 


7 
zi (y+ +++ + y)? dP 
= Sk 

N-—1 


(2) $2 te +y art [te + wae 


k=l YS; 


N N 
=D f wap+2d [wat + ma ae. 
k=l Yn2>k kee2 Yn>k 
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Since the event {n > k} is independent of y, , each term in the second sum 
vanishes and the first sum becomes 


N N 
-i yx dP = o°(x:) > P{n>k} 
k=l Y{n>k} ken 


3) = o°(x,)[P{n = 1} + 2P{n = 2} + --- NP{n = N} 


+ NP{n > N}] < o°(a)E(n). 
This establishes Theorem II(a). 
Proor oF THEOREM II(b). Write z; = | y;| and let Z = 2, + --- + 2 when 
n= k. From (a) it follows that o’[(Z — nE(z,)] is finite. If in addition, 
a(n) < « then E(Z?)<o. Thus the series 


(4) > f Gt taldP= > | aeaP 


k=l 9; lSi,jsk<o Js, 


converges, so that the series 


(5) ae | uss dP 


1Si,j7,5k<w 


converges absolutely. The terms of the latter series may be arranged to yield 


(A): | Gn + +++ + y)'aP = 6 (W — On) 
= Sk 
or to yield 


B: > | 


This proves Theorem II(b). 
Proor oF THEOREM l(c). It follows from Theorem II(a) that Ex,(W — @n) 
is finite. If we show that 


(6) E(W — @n| x1) = 1 — 0,1. E(Y | y:) = y, it will follow [1] that 
(7) Elx(W — @n)) = Ela(m — 8))] = o° (21). 


whdP +220 {mln +--+ mad aP = o%e)EC. 


n=k} 


To verify (6), it is sufficient to show that if f(x:) is the characteristic function 
of an event depending only on 2; (i.e. f(7:) = 1 when the event occurs, f(x) = 0 
otherwise) 


(8) E(fy) = EGY). 


Write ¢: = 0,¢; = f-(yet+ -:: + yi),24 > 2. 
Then it easily verified that 


(9) E(@;|u1,°-°:,%) = ¢@:forg >72 
(10) E¢; < du | ye | 


(11) E(¢;) = 0. 


—“ Be 


-_— ef 


mas 
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Hence it follows [2] that Eg = 0 where ¢ = ¢; when n = 7. In our case ¢ = 
{VY -—fuy, and Ep = 0 yields (6). This completes the proof of Theorem II. 

Proor OF THEOREMI. In [L]itis proved that E(2i(W — @n)) = E[V(W — 6n)]. 
Hence employing Theorem II we get 


(12) o'(x;) = E[V(W — @n)] = o(V)o(W — 6n)p 


where p, (0 < p < 1), is the coefficient of correlation between V and W — 6n. 
Substituting for o(W — @n) we get 


a (a1) < o(V)o(a1) VE(n) p 
< o(V)o(x1) VE(n). 


Solving for o(V) we finally obtain 


(13) 


o° (21) 


(14) a(V)> E(n) 





which proves Theorem I.’ 

If o°(n) is finite, the equality sign in (14) will hold if and only if p = 1. We 
shall now prove the following. 

THEOREM 1. Let N be the minimum value of n for which P(n = N) ¥ 0. 
Then, a necessary and sufficient condition that p = 1 is that P(n = N) = 1. 

Proor. The sufficiency of this condition follows from the fact that if 
P(n = N) = 1, V = W/N. To prove the necessity of this condition, we 
observe that if p = 1, V is a linear function of W — né6. That is, 


(15) V = a(W — né) + B. 


Now, since EV = 6 and E(W — né@) = 0, it follows that 8 = @. Also, since 
by hypothesis o°(V) = o°(a:)/E(n) and o(W — n0) = o°(2,)E(n), it follows 
that a = 1/E(n). Hence the estimate V is given by 


, W — né 


(16) E(n) 


+ 6 





1 Under certain regularity conditions Cramér has obtained the inequality 


a log f\? 
o(x) > 1/2 ( — ‘) 
a0 





where f = f(z, 6) is the density function of x ((3], p. 475). Thus with the same regularity 
conditions, our inequality yields 


a log f\? 
o(V) > W/EWE (228. ; 


which is a special case of the results presented by J. Wolfowitz in this issue of the Annals. 
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Let N be defined as above. Wenote that N < = since by hypothesis E(n) ; 
Let Vy be the estimate of 6 when the sequential test terminates with n = N. 
Then Vy = W/N. Substituting this value in (16) we get 




























W N |W 
17 Oe et ~~ Hh, 
a7) N E(n) H | 
We exclude the trivial case where W = N@. Then (16) yields E(n) = N., 
That is P(n = N) = 1. This proves the theorem. 
We remark that N may be a function of 6 but for a fixed 0, n = N is fixed 
when p = 1. 
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AN EXTENSION TO TWO POPULATIONS OF AN ANALOGUE OF 
STUDENT’S t-TEST USING THE SAMPLE RANGE 


By Joun E. WaAtsH 


Princeton University 





1. Summary. The modified t-test considered by Daly’ (see [1]) is used to 
develop one-sided significance tests to decide whether the mean of a new normal 
population exceeds the mean of an old normal population having the same 
variance. Significance tests are also developed to decide whether the mean of 
the new population is less than the mean of the old population. These tests 
require very little computation for their application and are approximately as 
powerful as the most powerful tests of these hypotheses. 


2. Introduction. Let 7,,---,7,, (mr < 10), be independently distributed 
according to a normal distribution with zero mean and unit variance. Let rj) 
denote the uth largest of the r’s. Then Daly has shown how to determine 
numbers ga such that 


Prif/(tT~@) — Ta) > Gal = a 
(1) tl 
Pri (Ten) <= Ta) < —Ja| =a. 


This note will use these relations to develop easily applied significance tests to 
decide whether the mean »v of a new normal population exceeds the mean yu of 





1 This problem is also considered by Lord in[2]. This note was in proof when [2] appeared. 
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an old normal population with the same variance. Significance tests are also 
developed to test vy < uw. The simplest case considered is that of testing a new 
sample value x on the basis of n past sample values y,,--- , yn. Then the 
significance test at significance level a to decide whether » exceeds u consists in 
accepting v > wif 


t>9tgaVn+t+ lym — yo, 


where y(u) is the uth largest of y:,--- , Yn- 
The significance test of v < u consists in accepting vy < wif 


t<9-—gaV¥n+1 lyin) — Ya). 










These tests are generalized to the case in which x is the mean of a sample of 
size r from the new population, each of y, , --- , yn is the mean of a sample of 
size s from the old population, and z is the mean of a sample of size ¢ from the 
old population. Then the tests at significance level a take the form 








Accepty > pifa > (1 — Cig + Ciz + galya) — Yo); 
Accept y < pifx < (1 — Cig + Ciz — galywm — yo), 


(2) 


where C; is a given constant which is selected by the person applying the test. 
The introduction of the terms z and C, allows less reliable past information to 
be utilized by lumping it together in the z term and using the constant C, to 
weight this information according to its relative importance with respect to 
the y’s. 

The power of test (2) is compared with that of the corresponding Student t-test 
for the case C; = Oandn < 10. In this comparison the quantities x,y; , --- , Yn 
are considered to be the given sample values which are used for the test, that is, 
the quantities from which the means 2, y:, --- , yn were formed are not given. 
It is found that the power of the Student /-test is only slightly greater than that 
of the corresponding test (2). For the cases considered, however, it is well 
known that the most powerful test of vy > uw using the quantities 7, y1,--- , Yn 
is the appropriate Student é-test. Similarly for testing vy < uw. Thus the tests 
(2) considered are approximately as powerful as the most powerful tests of 
vy > wand yv < yw which use 2, yi, °** , Yn- 

Examination of (2) shows that the amount of computation required for the 
application of one of these tests is small. Consequently the tests (2) have the 
desirable properties of being easily computed and nearly as powerful as any 
tests which could be used for the given hypotheses. This suggests their use in 
repetitive testing procedures which are concerned with the testing of the mean 
of a new sample on the basis of the means of previous samples. 
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3. Statement of tests. In this section three significance tests of increasing 
generality are stated. It is to be observed that each test is a particular example 
of the test following it so that tests (A) and (B) are special cases of test (C). 
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The reason for stating tests (A) and (B) is that these tests have a much simpler 
appearance and will cover most cases of practical application. 

(A). Let each of x, y:,--- , yn represent the mean of a sample of size r; let 
the values of the sample whose mean is x have the distribution N(y, 0”) and the 
values of the samples whose means are y1, °-* , Yn have distribution N(y, o’), 
where the notation N(é, o°) denotes the normal distribution with mean ¢ and 
variance o. Then the significance test of vy > yu at significance level a is 
n+ 1 


- [Yn) = ya. 


Accept vy >p off t>9+ Ga 





The significance test to decide whether v < yp is 


4 /n +1 
Accept v<up if ©< 9 — Ja nF Tye — yo). 


(B). Let « equal the mean of r sample values from N(v, o”) and each of 
Yi, °** » yn equal the mean of s sample values from N(u, o). The significance 
test for vy > yu at significance level a@ is 


. n 1 
Accept v>p ff tr > + Ga (2 +h tu — yo). 


The test of v < wis given by 
a n, 1 
Accept v<up ff ©<Y— Ga y/" 2 [yin) — Yoo). 


(C). Let x equal the mean of r sample values from N (vy, o), each of y1, °° yn 
equal the mean of a sample of size s from N(u, o), z equal the mean of a sample 
of size t from N(y, o°), and C, be a given constant value. Then the significance 
test of vy > uw at significance level a is 

Accept v > pif 


r> (1 — C)g9 + C2 + [yw — yolga: ‘c 4 7 n+ Al — Gy 
, 


4/ t 


The significance test to decide whether v < yu is 
Accept v < wif 








‘ / — — = ; 
x < (1—- Cig + Ciz— [yin aa yada ‘ / E + a n + jo 
A 7 s ¢ yl ct) 
r t 
Values of ga for a = .05 are given in Table I. These values were listed by 


Daly in [1].? 


2 Values of ga for a = .05, .025, .01, .005, .001. and .0005 are listed in Table 9 of [2] for 
sample sizes from 2 to 20. 
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4. Derivation of tests. As tests (4) and (B) are particular cases of test 
(C), it is sufficient to derive test (C). 








5 

TABLE I 

Estimated Values of 9.65 
n _ | 9.05 
3 .882 
4 .526 
5 .385 
6 .309 
7 . 260 
8 227 

9 . 202 


2 10 . 183 


Let the quantities 2’, Yi,°°*,Ya,2 be defined by 


wa Grevr yy _Yi-wVvs 


’ SSS (@ = 1, ---,m), 
o o 
ya @-wvi 
2 
Then x’, yi,--*, Yn, 2’ are independently distributed according to N(0, 1). 
Define 
1 po . il sata 
ry, = K (Hiv _ = y; + Kexr’ + KC), (u=1,---,n). 
1 1 


It is easily verified that 


3 aro 
E(r,) = 0, E(r,.) = el + (1+ C’)Ks — 2K, + nJ 


I hea ; 
E(rutv) = KR [a + C’)Ke — 2Ki + nl, (u ¥ v). 


Thus, if K, and Ky, satisfy the equations 


y (3) (/2 +C cy ) Ke+ Ky -—-n= 


+ C*)K; — 2Ki+n =0, 


the r, will be independent of » when » = v. Also they will be independently 
distributed according to N(0, 1). 
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Rewriting the r, in terms of z, y1,-°* , Yn, 2 one obtains 


(4) n= M2[ Kin - Dut Kag/le + Kec 4/2 + Ka g/t ~ 


Using (3) the mean of the r, is found to be 


~ %vel 2 —(1 + c4/")s +o4/te —-(%y- a). 


Let 7,,) denote the wth largest of 7, ,--- ,7,. Then from (1) 


= PrlF/(tun) — Tw) > Gal = Pr ey (« — (1 +C /') y 
i. 1 r 


ca 
+ VV; ‘~- G@= ») / — yw) > to| 
It is easily ne from (3) that 


Ze _ = 24/it + O (n+ a) 


Choosing the positive sign, putting C = — ft C,,and letting » = v one obtains 





Prjza> 1— Cig + Ciz 





+ [yin aa yada : /(* + +2 =) (n n + += a = a, 


verifying the first part of test (C). The second part of test (C) is verified by 


- 


e ° ‘ 1 ‘ P 
choosing the negative sign for K afi (or by repeating the above argument using 
LYovVr 





the second part of (1)). 









5. Power comparison with t-test. Let x,y, --- , yn satisfy the conditions 
of test (B) in section 3. Then Student’s ¢ using x, y1,--- , Yn 18 given by 


i S=As ese. fet 
Veu-v VW oG+z) 


The Student i-test based on this value of ¢ furnishes the most powerful test of 
vy > pw(and »v < yw) using x, y1,---,¥,. The purpose of this section is to show 
that test (B) has approximately the same power as this Student ¢-test for n < 10. 

Daly has shown (see [1]) that if r:,--- , 7, are independently distributed 
according to N(é, o°), then the test based on 


(F — &)/(Tm — Ta) 
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has approximately the same power for testing > 0 (and — < 0) as the corre- 
sponding Student t-test based on 


(5) pa F—- HV n(n — 1) 
V dX (r; — #)° 
forn < 10. 


Using the notation of section 4 let 


r= V8/ Kine Dut Ke 4/te], (u = 1,-+-,m), 
1 1 § 


K 
where - > 0. Then from consideration of (4) with C = 0 it is seen that the r, 
2 


are independently distributed according to N(é, o”), where £ equals a positive 
constant times (v — yw). Following the derivations in section 4 with C = 0, 
it is seen that the test of £ > O with this particular choice of the r, is identical 
with the test of v > yw given in (B) of section 3. Similarly the test of — < 0 is 
identical with the test (B) of vy < uw. Thus the test (B) has approximately the 
same power for testing vy > wu (and v < yz) as the Student ¢-test based on the value 
of ¢ given in (5) ifm > 10. Replacing the r, in (5) by their values in terms of 
Z,Y1,°** »Yn,M, 7, and s, it is found that (5) becomes 


_ f#-g9-0-»)) 
i i ~- 9)” 
1 
This proves that test (B) is approximately as powerful for testing » > » and 


vy < pas the most powerful test based on the quantities x, y,,--- ,ynifn < 10. 
As test (A) is a particular case of test (B), these results also apply to test (A). 


t 
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ON THE NORM OF A MATRIX 


By AuBert H. BOWKER 
University of North Carolina 


In studying the convergence of iterative procedures in matrix computation 
and in setting limits of error after a finite number of steps, Hotelling [1] used 
the square root of the sum of squares of the elements of a matrix asitsnorm. A 
wide class of functions exists which may be employed as norms in matrix calcula- 
tion and substituted directly in the expressions derived by Hotelling. The 
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purpose of this note is to make a few general remarks about this class of functions 
and to propose a new norm which appears to have some value in computation. 

A function ¢(A) of the elements of a real matrix A may be termed a legitimate 
norm if it has the following four properties: 


(1) ¢(cA) = |c| (A), ca scalar; 

(2) ¢(A + B) S o(A) + o(B), if A + B is defined; 

(3) ¢(AB) < $(A)¢(B), if AB is defined; 

(4) d(e;;) = 1, where e;; is a fundamental unit matrix 


whose elements are all zero except the one in the 7th row and jth column, whose 
value is unity. These four conditions are identical with the first four axioms 
of Rella [2], who has shown them to be independent. Properties (1), (2), and 
(3) are used directly in investigations of convergence and error, but the im- 
portance of property (4) is indicated by some of its immediate consequences. 
Clearly e;Ae; = a;;, where e; is a fundamental unit vector. From (3) and (4) 
it follows that | a;; | < $(A) for all 7 and j and we have that 


(5) max(i; | @:;| S (A). 


Thus ¢(A) has the useful property that the norm of a matrix of errors exceeds 
or equals the maximum possible error. Since ¢(A”) < $”(A), it follows from 
(5) that the elements of A” will tend to zero as m increases if ¢(A) < 1, a result 
which is useful in establishing convergence. Also ¢(A) = 0. 

One further consequence of (1) to (4) is of interest. Suppose A is a square 
matrix and let *\ be any of its roots. Then there exists a non-null vector z 
such that Ar = Aw. Now (Ax) = Ad(x) SF O(A)G(x) and we have 


(6) \ S$ ¢(A). 


Thus, every legitimate norm is an upper bound to the characteristic roots. 
Clearly many functions exist which satisfy (1) to (4). The norm used by 


Hotelling is N(A) = >>a;; . A new norm which may have some value is 
1.7 


obtained as follows: 
(7) R(A) = max,;R,(A) 
where : 


RA) = 7 | Qs; | - 


2 
Clearly R(cA) = |c| R(A). To show that R satisfies (2), consider 
R(A + B) = D0 | ay + bi | S Do ai | + | bs | S R(A) + RB). 
7 J a 


Since the above inequality holds for all 2, 


R(A + B) S R(A) + R(B). 
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Now AB = || D Giada; || 
a 


and 
RAB) =>, | De Gia bai ia 
? Qa 


> | aia | Re(B) < R(B)R(A). 


Hence R(AB) S R(A)R(B). Clearly R(e;;) = 1. Similarly it may be shown 
that C(A) = max;;) » | a;;| also satisfies the conditions of a norm. 


Since the convergence of an iterative procedure is often proved by the norm 
being less than one, since the norm appears in the upper bound for the error 
after a finite number of iterations, and since the norm of a matrix of errors is 
taken to indicate the magnitude of the errors, a reasonable method of choosing 
among several available legitimate norms is to select the smallest. It is natural 
to inquire whether an optimum norm in this sense exists; that is, is there a 
function ¢*(4) such that ¢*(A) possesses properties (1) through (4) and such 
that ¢*(A) < (A) for all other ¢(A) satisfying these conditions. Assume such 
a ¢*(A) does exist. Clearly ¢*(A) = $*(A’), as, if either exceeded the other, 
the smaller could be taken as ¢*(A). Let A” be the largest root of AA’. Then 
by (6) 

A’ < ¢*(AA’) S ¢*(A) and A S ¢*(A). 
But Rella [2] has shown that A possesses (1) to (4). Thus 
o*(A) = A. 
But, for a row vector, C(A) S A. Consequently, no minimal norm exists. It is 
interesting to note that a worst norm does exist, namely P(A) = >> | ai;|. 
i,7 
Since A = > €:;a:;,¢(A) < P(A). Clearly P(A) satisfies (1) to (4) and hence 
tJ 

is the worst possible legitimate norm. 

In practical computation, the choice so far is between N(A) and R(A) (or 
C(A)). No general inequalities exist and it would probably be advisable to 
compute both. R(A) may be less than N(A) and indicate convergence when 
N(A) fails to do so. Often R(A) may be computed visually and convergence 
proved without computing the sum of squares of the elements. 

The functions N(A) and R(A) may also be useful in finding a simple first 
approximation to A~*. A sufficient condition that Hotelling’s iterative method 
for finding the inverse of a matrix A will converge is that the roots of 
D = 1 — AC, be less than one in absolute value where C% is a first approximation 
to A’. If the iterative procedure is to be carried out by a fully automatic 
computing machine such as the one described by Alt [3] it may be advisable to 
start with a rather poor first approximation which is easy to construct. If A 
has positive roots and if M is any upper bound to these roots and if Cy is a matrix 
with diagonal elements equal to 1/.W and zeros elsewhere, the iterative procedure 
will converge but the norm of D will not necessarily be less than one. From 
(6), any legitimate norm may be taken as M. 
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Finally, it is interesting to point out the relation of this note to some work op TI 
the problem of finding upper bounds to the roots. In fact, the inequalities di 
\ S< N(A) and dX S R(A), which are consequences of (6), are Theorem 2 of of 
Farnell [4] and Theorem 3 of Barankin [5] respectively. fu 
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a 
DEFINITION OF THE PROBABLE DEVIATION ' 
By M. Frfcuer N 
' a i ; . ; f 
Faculty of Science, University of Paris 





The probable deviation has recently been defined by E. J. Gumbel [1], [2] 
as the smallest of the intervals corresponding to the probability 3. It so hap- 
pened that the author was led to an equivalent definition starting from a general 
idea which may be applied to absolutely general cases and which, for this reason, 
might be of interest. 

In recent years, the author has been occupied with a study of random ele- 
ments of any nature (curves, surfaces, functions, qualitative elements), a study 
whose future seems promising, [3]. I gave a definition of the mean of such an 
element expressed by an abstract integral which, however, is only defined if the 
random element is situated in a metric vectorial (Wiener-Banach) space.! But? 
a still more general definition is valid if the random element is placed in any 
metric space. It consists of taking, as mean position of the random element X, 
a fixed (non-statistical) element b = X such that the function of a which rep- 
resents the mean M(X, a)” of the squared distance of X to the fixed element a, 
is minimum for a = b. (In the case where X and a are numbers, and where 
M(X)’ is finite, we know that this minimum is reached and that there is one, 
and only one, determination b of a). This definition has the advantage of also 
defining the equiprobable position of X. This is a fixed element c = X such 
that M(X, a) is minimum for c = a. (If X and a are numbers, we know that 
this minimum is still reached, but may be so reached by several values of X). 

Since reading Gumbel’s paper, a still more general definition suggested itself. 






1 For the definition of metric vectorial spaces see [4]. 
2 See Note 2, p. 503 of [4]. 
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The expressions 1/(X, a) and ~/ M(X, a)? themselves may be considered as 
distances, but as distances of two random elements taken together. To each 
of these distances corresponds as minimum, when a varies, a different “typical’”’ 
function X or X ---. Thus, without supposing anything about the space 
into which the different trials place X, we assume that we have defined a ‘“‘de- 
viation” of two random elements X, Y taken together. We represent this 
function of two random variables by (([X], [Y])), a notation which differs from 
the representation of the distance (X, Y) of the two positions X and Y with 
respect to a single trial. The lower boundary of the deviation (({X], [a])), a 
function of a, which is reached for a = X defines a “‘typical’’ position X. More- 
over, the value of this (({X], [X])) may be considered as a measure or, at least, 
as a numerical ranging point of the dispersion of X. 

Let us abandon these generalities. They hold especially if the element X 
is a real valued random variable. Among the possible and reasonable’ expres- 
sions for the deviation (([X], [a])) of the numerical variate X from a fixed number 
a, we may use the equiprobable value of | X — a| which may be called the equi- 
probable deviation of X from a. Thus we have, on one side, a new “typical 
value” of X which will be a value of a such that the equiprobable deviation of X 
from a is minimum, and a new measure of dispersion which is the value of this 
minimum and which might be called simply the equiprobable deviation of X. 

In the case where X has everywhere a continuous and finite density of prob- 
ability w(X) we find, as typical value, what Gumbel calls the ‘midvalue”’ 
and represents by {, and, as equiprobable deviation, what Gumbel calls the 
“probable deviation”? and represents by ¢. 

We may also consider the discontinuous case, which was given as a problem 
to candidates of the ‘‘Certificat d’Etudes Supérieures de Calcul des Probabilités, 
Option Statistique Mathématique, Session May-June, 1944.”’ They had to 
solve various questions of which I cite the beginning below: 

“Consider n real numbers 2; S x2 S --- S x, and represent, by E, , a median 
value of the deviations | x, — a| of the numbers x, and a. If a varies, E, has 
a minimum E which is reached by one or several values A of a. 

1) Explain, in a few words, the meaning of the values E and A. 

2) For simplicity’s sake, suppose that n is odd (n = 2r + 1). How should 
E and A be calculated practically? (To find the answer, investigate first how 
E, varies if a varies only slightly). 

3) In the case where n = 4s + 3 (s is an integer equal to, or larger than, zero) 


show that E < = D) 


” 
Tn-s- 


where aA = fei, & = 


The study of this new typical value and of this new equiprobable deviation 
has the advantage that their determination is very rapid and requires hardly 


3 See the Remark at end of note. 
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any calculations. However, we have to note an important inferiority of the 
equiprobable deviation of X compared to the mean and the standard deviations 
of X. If one or the other of the last two deviations is zero, X is a fixed number 
(except for the case of the probability zero). This property seems requested 
by the intuitive meaning which we attribute to the dispersion, and to every 
measure or any mark of it. Now, the equiprobable deviation lacks this property. 
If, for instance, X has only three values: 0, 2, 1, the first two with the probability 
0.249, and the last with the probability 0.502, the equiprobable deviation of X 
will be zero, whereas X will be equal to its typical value 1 only with a prob- 
ability of 0.502, and not with a probability equal to unity. The same holds 
for any distribution for which there is a point with probability exceeding }. 

Remark. The definitions of the mean and of the equiprobable position become 
meaningless in the case that M(X, a), or M(X, a)’, is infinite. However, we 
succeeded in surmounting the difficulty, and to reach definitions which are valid 
even in this case. If X is a number, the new definitions become equivalent to 
the classical definitions of the mean and equiprobable value. The proofs are 
given in tivo recent articles [5], [6]. 
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THE GENERAL RELATION BETWEEN THE MEAN AND THE MODE 
FOR A DISCONTINUOUS VARIATE 


By M. Fritcuer 
Faculty of Science, University of Paris 

Dr. Gumbel has pointed out that one of the author’s arguments employed in 
several particular cases (see [1]) can be employed in a general case which includes 
them and leads to the following result: If a statistical variate R has only positive 
entire values differing from zero, and if its mean value & is smaller than, or 
equal to, unity, the same holds for its equiprobable value R and its mode R. 
There are two generalizations of this result which might be of interest: 


1) On the one hand, the author has shown [2] that, if a variate R can only 
have values (entire or not) equal to, or larger than, zero, its equiprobable value 
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R is, at most, equal to twice its mean value R, and the inequality R/R < 2 
cannot be improved which means that the upper boundary of the first member 
is exactly equal to (and not less than) two. The equality is reached when R 
has only two values of equal probability, one of them being zero. 

2) On the other hand, if R is an integer positive variate equal to, or larger 
than zero, it can be proven that, if R < a, we have 


a(a + 3) 


) rR< 
(1) = 3 


Here, and R& stand for the mean and for the mode of R respectively, and a is 
a positive integer differing from zero. For example: if R is the number of rep- 
etitions of an event with probability p, we have, for n trials, R = np, whence, 
if a is the first integer number equal to, or larger than, & we have the inequality 
(1) for the most probable number of repetitions. Naturally, this inequality 
only has an interest if the second member of (1) is smaller than n which means 
that 
ala + 3) < 2n. 
This presupposes 


2n > np(np + 3) 


nm < > 
Pp 
and, since n must be positive, 
p < %. 
To prove the inequality (1), let us write w, for the probability that R = ». 
We have 


oc 
Ym =Rea 
0 


a—l Cs) 
(2) a (a — vw, = 2», (v — a)w,. 
0 a+l 


Let the mode be 


then 
we = a, ? => 0, 1, 2, sei 


and the first member in (2) is bounded by 


\ a—l 
(3) Be op = De — vw. 
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Now, either a < Bor8 <a. In the first case the second member in (2) leads to 


(4) > ior i a ails 


ti 
Vv 
since the second member in (4) is one of the terms occurring in the sum. The 
same inequality holds in the second case, 8 S a, hence it holds generally. It 
follows from (2), (3), and (4) that 
+ 1 Vv 
mS a = (8B — aug. 
The probability w, is certainly different from zero, since >> w, = 1. Conse- a 
0 


quently 


poas Mt 


— a(a + 3) 
2 


as stated in (1). 
The equality in (1) is possible only if, from (3), 


a(we > wo) + (a —_ 1) (wz — w1) > hha + (wg = Wa-1) = 0 
and from (4) 





Wa+l + 2wa+2 + oro > (8 ox a) eg + sea 


whence 
(5) 


and 







(5’) a -— ee Se Uf tlUmS 0. 


The existence of the exceptional case proves that the inequality (1) cannot 
be improved by replacing the second member by a smaller function of a. In 
the exceptional case, the only possible values of R are 











R = 0,1, 2, -- 


,~@e-— 1,e, é, 








and all values, except perhaps a, are equiprobable. 
be, but need not be, equal to we. 
Moreover 


The probability w. may 


a(a + 3) 


6) B= See 


IV 


a 


RELATION BETWEEN MEAN AND MODE 293 


and 8 = a is possible only if a = 8 = 0 whence, from (5), w, = 0 except for 
y = 0 which means that R only has one value equal to zero. Except for this 
trivial case, we have in the exceptional case 8 > a, and there are a + 2 possible 
values for R. Then we must have 


a 
wp 2 wa; Dio twp = 1 
0 


whence 
(a + l)ws + wa = 1 
and, from (5), 


a—l 
a R= op Dr + fog + ase = wy (EOD + 2Ot 3) + aw 


= a((a + 1)ws + wa) 
whence 


(7) 


From 


1 = (@ + 1)wg + wa = (2 + 2) we 


follows 


1 1 — we 
, een = 
(8) “as 2” WB o * 


These conditions (5), (5’), and (7) are necessary and sufficient for the existence 
of the exceptional case. 

If the equality in (1) is excluded, the mode 6 and the smallest integer number 
a which is equal to, or larger than, the mean, are related by 


2 

9) gs Me t3)_ ye t3at2 
2 2 

As shown before, this general inequality, valid for any discontinuous variate, 

which can assume only non-negative integer values, cannot be improved without 

assuming specific properties of the distribution. 
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NOTE ON DIFFERENTIATION UNDER THE EXPECTATION SIGN 
IN THE FUNDAMENTAL IDENTITY OF SEQUENTIAL ANALYSIS 


By T. E. Harris 


Princeton University 


Let z be any chance variable and 2; , 22, z3;, --- a sequence of independent 
chance variables, each with the same distribution as z. Let Zy = 2: + 2 4- -:: 
7 t 
+ zy. Let o(t) = Ee** for all complex ¢ for which the latter exists. Let S,, 
Se, +--+ be a sequence of mutually exclusive events such that S; depends only 
oo 
on 2,2, °-:,2;, and >, P(S;) = 1. Let the chance variable n be defined 
j=l 
as n = j when S; occurs. Blackwell and Girshick [1], generalizing a result 
of Wald [2], showed that if there is a positive constant M such that 


(1) |Zy| < Mwhenn > NV 
then the identity 
(2) Eje*"“(o())"} = 1 


holds for all complex ¢ for which ¢(¢) exists and | ¢(f) | > 1. Wald [8] estab- 
lished conditions, including the existence of ¢(¢) for all real t, under which 
(2) may be differentiated under the expectation sign an unlimited number 
of times. 

Without assuming the existence of ¢(¢) for a real t-interval the following result 
holds: Jf (1) is true and if E(z") and E(n*) are both finite, k a positive integer, 
then 


(3) E 2, (c?**(6(4))“hao} =0 
ds 


where i = ~/—1 and sis real. Certain identities, obtained by differentiating 
(2) and putting ¢ = 0, can also be obtained from (3). For example, if Hn = 0, 
and if En’ and Ez’ both exist then EZ*, = Ez’En. 

Let Py = P(n < N); pw = P(n = N). Let Hj, Z;) and F(N, Zy) be the 
conditional cumulatives of Z; and Zy for n = j and n > N respectively. Now 
(2) was derived by Wald [2], p. 285, from a relation, valid whenever ¢(¢) exists, 
which in the present notation ‘becomes 


—j e2 jt 1H a fe Py) "tnt IF N Z . ss 
> Pi [. (¢() dH (j, Z;) «4 aay | @ * FW, Zs) 
Examination of Wald’s derivation of (4) shows it to be valid under the present 


hypotheses. Now the finiteness of E(z") clearly implies that of E(Z}|n = j). 
Also, since F(N, Zw) is constant outside the interval [—M, M], the integral 


[ Z\, dF(N, Zy) is finite. Hence we may set ¢ = is in (4) and differentiate 





A UNIQUENESS THEOREM 


k times, obtaining for all real s 


> ws [Zoey | ans, 2) 
a 


+ (1 — Py) 2 (*) i [(p(is))~*]- [ Zp)" e®** aF(N, Zy) = 0. 


The derivatives of (¢(is))"” are sums of terms of the form Q(N)-(¢(is))~*” 
times terms independent of N, where Q(N) is a polynomial in N of degree < k. 
For any r < k, 


lim | (1 — Py)N’| = lim |N” 7 p;| < lim | dD 5 pi| =9, 
N7>© Ne | j=N+1 No | j=N+1 

since En‘ is finite. Hence lim (1 — Py)Q(N) = 0. Because of (1) the inte- 
grals in the second term of (5) are bounded as N > ~. Now set s = Qin (5) 
and then let VN — «. Since ¢(0) = 1, the second term of (5) approaches 0 
and the limit of the first term is just the left side of (3). 

For the case of a Wald sequential process, Stein [4] has shown that all moments 
of nare finite. In this case (3) holds whenever Ez‘ is finite. 
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A UNIQUENESS THEOREM FOR UNBIASED SEQUENTIAL 
BINOMIAL ESTIMATION 


By L. J. Savage’ 
University of Chicago 


In a recent note [1], J. Wolfowitz extended some of the results of a paper by 
Girshick, Mosteller and Savage [2] on sequential binomial estimation. The 
present note carries one of Wolfowitz’s ideas somewhat further. The nomen- 
clature of [1] and [2] will be used freely. The concept of “doubly simple region” 
introduced in [1] and assumed there only in the hypothesis of Theorem 3, will 
here be shown to be unnecessarily restrictive. In so doing, we find that sim- 


‘The author is a Rockefeller fellow at the Institute of Radiobiology and Biophysics, 
University of Chicago. 
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plicity is not only a necessary (cf. Theorem 4 of [2]) but also a sufficient condi- 
tion that p be the unique unbiased estimate of p for a closed regicn. 

Lemma. If R is simple there is at most one bounded unbiased estimate of any 
given function of p. 

Proor. If the lemma were false, there would be a non-trivial bounded un- 
biased estimate of zero, i.e., m(a) such that | m(a) | is bounded by a constant 
m*, m(a) not identically zero and E(m(a) |p) = 0. 


(1) E(m(a) | p) = >. ma)k(a)p*¢ = 0. 


and m(a) not identically zero. Since R is simple we may assume (much as in 
the proof of Theorem 6 of [2]) that we have a boundary point such that 
m(ao) ~ 0, ao is below all accessible points of its own index and also below 
every other a for which m(a) # 0. Therefore 
(2) | m(ao) | k(ao)p’’q’? = | Dy ma)k(a)p’g* | < m* > kla)p" ¢. 
y>vo y>vo 

Let M denote the set of all accessible points and boundary points at which 
x <x and y = y+ 1. There are at most 2» points in M, say 6i,--- , By. 
Considering the way in which ap has been chosen, every path from (0, 0) to ana 
for which y > yo passes through or to at least one point of 17. Therefore when 
Y > Yo 


P(a) = k(a)p’q* = P(a| M)P(M) 
< P(a| M) do k(B)p" 
1 


cr » k(8,;)P(a | M). 


From inequalities (2) and (3). 


| m(ao) | k(ao)p" gq"? < m*p”* ‘> Ka} > Pla! M) 
1 


y>Vvo 


(4) ; 
< m*p’*! D k(B)). 
1 


But it is impossible that (4) should be satisfied for small p. 

Combining the Lemma with Theorem 4 of [2] we have the 

THEOREM. A necessary and sufficient condition that p(a) be the unique proper 
(bounded) and unbiased estimate of p for a closed region R is that R be simple. 

The sufficiency part of this Theorem extends Theorem 3 of [1] from doubly 
simple regions to simple regions. 

The author is indebted to J. Wolfowitz for his valuable suggestions in connec- 
tion with the present note. 
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ABSTRACTS OF PAPERS 


Presented on January 25, 1947, at the Atlantic City meeting of the Institute 


1. A Test of Significance of the Coefficient of Rank Correlation for more than 
Thirty Ranked Items. Nitzan Norris, Hunter College. 


Hotelling and Pabst (Annals of Math. Stat., Vol. 7 (1936), p. 37) have suggested the use 
of the Tchebycheff inequality as an approximation for testing the significance of the co- 
efficient of rank conrelation in cases where the number of ranked items is too large to enable 
exact probabilities to be computed directly. A table prepared in accordance with this 
suggestion indicates that for values of the coefficient of rank correlation larger than .50 
there is a wide range of corresponding numbers of ranked items greater than thirty for 
which at least the five per cent level of significance is satisfied. 

For certain types of applications the conservativeness of the Tchebycheff test may be 
a virtue rather than a limitation. 


2. A Generalized T Measure of Multivariate Dispersion. Haro_p Hore unc, 
University of North Carolina. 


The problem of combining errors in two or more dimensions to measure the accuracy of 
firing and bombing is similar to problems occurring in industrial quality control where 
different measures of quality are applied to the same article, and to problems in mental 
testing and other fields. If the covariances were known a priori, the solution optimum 
in certain senses, for a multivariate normal distribution, would be the use of x? = DLDAi;2:2;, 
where [A,;]~! is the covariance matrix and 2; is the deviation in the ith dimension. Since 
the covariances must in all known practical cases be estimated from a preliminary sample 
with (say) degrees of freedom, x? may be replaced by JT? = ZT2l;;x;x; , where [I;;]“! is 
the estimated covariance matrix. This is the same 7' introduced by the author in 1931 
as a generalization of the Student ratio ¢, and has the same distribution. Upon adding 
together the values of T? for different cases (e.g. for different bombs dropped with the same 
bombsight), a combined measure 7? of over-all excellence (e.g. of the bombsight), is ob- 
tained. 7) like x?, can be broken down into components meaningful with respect to the 
causal system, specifically in relation to possible sources of excessive discrepancy. Thus, 
if 7; is the zh coordinate of the centroid, or mean point of impact, of m bombs, we may 
write Ty, = D2lsj%:%;, T?2 = T? — Ti. Then Tp is a function only of deviations from 
the mean point of impact. Asymptotically (for large n), To, T and T'p have the x dis- 
tribution with m, 2 and m — 2 degrees of freedom respectively. But the untrustworthiness 
of the x distribution as an approximation is evident even with n as large as 256, for which 
case calculations have been made. The exact distributions of 7») and T'p are ascertained 
when the number of variates p is 2, and the probability integrals are expressed as linear 
functions of two incomplete beta functions. In fact, T;/M equals the sum of the roots 
of a determinantal equation of the form | A — Ag | = 0, where A and B are sample covariance 
matrices with n and m degrees of freedom respectively, and a similar relation holds for Ts 
with m replaced by m — 2. Ty and 7'y have the distribution published in 1931, with prob- 
ability integral expressible in terms of a single incomplete beta function or the variance 
ratio distribution. It is shown that such parameters as the circular mean deviation are 
best estimated with the help of the 7 measures, not directly by averaging individual cir- 
cular deviations. 


3. Asymptotic Properties of Maximum and Quasi-Maximum Likelihood Esti- 
mates. HrrMAN Rustin, Cowles Commission for Research in Economics. 
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The results of J. L. Doob (Trans. Am. Math. Soc., Vol. 36 (1934), pp. 759-775) oncon- 
sistency of maximum likelihood estimates, are generalized and extended to arbitrary mea- 
sure spaces. In some special cases, results on asymptotic normality of maximum likeli- 
hood estimates can be generalized to quasi-maximum likelihood estimates (estimates based 
on the assumption of a likelihood function which need not be the true function). 


4. The Asymptotic Distribution of the Range. FE. J. Gumpert, Newark College 
of Engineering. 


The asymptotic distribution of the range w for initial unlimited distributions of the 
exponential type is obtained by convolution of the asymptotic distributions of the two 
extremes. Let oc: and u be the parameters of the distributions of the extremes for a sym- 
metrical variate, and let R = a(w — 2u) be the reduced range. Then the probability 
¥(R) of the reduced range is subject to the differential equation ¥” + v’ — W exp (—R) = 0 
which may be transformed into Bessel’s equation of the first order by the substitutions 
R = 2(log 2 — log z), andW = zU. Thesolution is ¥(R) = zK,(z) for the asymptotic prob- 
ability, and y(R) = (2?/2)Ko(z) for the asymptotic distribution, Ko(z) and K,(z) being the 
modified Bessel function of the second kind of orders zero and unity. Thus tables of ¥(R) 
and ¥(R) may be calculated for any symmetrical distribuion of the exponential type. 
The distribution of the range w for normal samples of size 10 is already very close to the 
asymptotic distribution provided that the parameters a and u are determined from the 
mean and the standard deviation of the range. This method permits the calculation of 
the distribution of the range for normal samples of any size larger than 10. 


. The Corner Test for Association. Joun W. Tukey, Princeton University, 
and PAuL 8. OtmMsTEeAD, Bell Telephone Laboratories. 


Construction. In a scatter diagram, draw the two medians, that is, the median of the 
z values without regard to the values of y, and the median of the y values without regard 
to the values of z. Think of the four quadrants thus formed as being labelled +, —, +, — 
in order, so that the two positive quadrants lie along one diagonal and the two negative 
along the other. Beginning at the right-hand side of the diagram, count in along the ob- 
servations until forced to cross the horizontal median. Write down the number of ob- 
servations met before this crossing, attaching the sign, +, if they lay in the + quadrant, 
and the sign, —, if they lay in the — quadrant. Repeat this process, moving up from 
below, moving to the right from the left, and moving down from above. The quantity to 
be used in the test is the algebraic sum of the four numbers thus written down. 

Distribution. The exact distribution of this quantity when no association is present 
and no two z’s and no two y’s are alike is almost independent of sample size over the range 
of values where it is apt to be used. For example, a sum of 9 or more is expected less than 
one time in ten for all samples of size 6 or more; a sum of 15 or more, less than one time in 
100 for all samples of size 10 or more; and a sum of 21 or more, less than one_ time in 1000 
for all samples of size 14 or more. Even for infinite sample size, the sums for these fractions 
become only 9, 14, and 19, respectively. 

Extensions. The same ideas that underlie the outside corner test for two variables 


may be extended in several ways to give tests for various types of association among three 
or more variables. 


6. Consistent Estimates Based on Partially Consistent Observations, with 
Particular Reference to Structural Relations. J. NeyMAN AND ELIZABETH 
L. Scorr, University of California. 
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Let {X,} be a sequence of independent random variables and let F; denote the distribu- 
tion of X;. Each distribution F; is assumed to depend on unknown parameters. If g 
parameter @ appears in an infinity of distributions F; , it is called structural. Otherwise, 
it is incidental. The sequence {X,} is called consistent if {F,,} has no incidental parameters, 
{X,} is called partially consistent if {F,,} has both structural and incidental parameters. — 
Problem of fitting a straight line when both variables are subject to errors is that of a 
partially consistent series of observations. Let and = a + B¢ be two linearly connected 
quantities, perhaps related to particular stars, where a and 6 are unknown. The values 
€; and ni correspondi: g to the ith star, (¢ — 1,2, --- ,s), are unknown. The observations 
provide measuremenis 2; of &, (j — 1, 2,---, mi), and measurements yx, (k = 
1,2, -++ ,n:),o0f9:. Both m,; and n; are bounded and small. On the other hand, s may be 
considered as increasing without limit.—Assume that the z,;; and the yi, are normally 
distributed with variances a; and o: and means é; and »; respectively. Then the totality 
of observations will form a partially consistent system with the structural parameters a, £, 
o; and o, and with £; as incidental parameters.—If the observable random variables are only 
partially consistent, then the maximum likelihood estimates of the structural parameters 
(a) need not be consistent, (b) even if they are consistent and asymptotically normal, 
alternative estimates may exist which have the same properties but smaller asymptotic 
variances.—Consistent estimates of structural parameters may be obtained from ‘‘modi- 
fied’’ equations of maximum likelihood. The lower bound of the variance of estimates of 
structural parameters, provided by the Cramér-Rao inequality, is attained only on certain 
conditions which are both necessary and sufficient. 





NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Dr. Paul H. Anderson has been appointed Economic Analyst with the Market- 
ing Division, Office of Domestic Commerce, Department of Commerce, Wash- 
ington. 

Dr. Gilbert W. Beebe is now with the Division of Medical Sciences, National 
Research Council, Washington. 

Professor Harald Cramér, Director of the Institute of Mathematical Statistics 
of the University of Stockholm, was awarded the degree of Doctor of Science, 
honoris causa, by Princeton University on February 22, 1947. Professor Cramér 
has acted as Visiting Professor of Mathematics at Princeton University and 
Yale University during the academic year 1946-’47. He will be at the Univer- 
sity of California at Berkeley during the 1947 Summer Session. 

Dr. Paul M. Densen has accepted a position with the Division of Medical 
Research Statistics, Bureau of Medicine and Surgery, Veterans Administra- 
tion, Washington. 

Mr. M. V. Divatia is now in charge of the office of the Statistician and Eco- 
nomic Adviser and Under-Secretary to the Government of Sind, Karachi, 
India. 

Mr. Clarence B. Fine, formerly with the Office of Price Administration, has 
transferred to the Bureau of Old-Age and Survivors Insurance, Social Security 
Administration, where he is employed as a Sampling Expert: 

Prof. Charles C. Grove was appointed Visiting Lecturer in Mathematics at 
the University of Pennsylvania for the spring semester. 

Assoc. Prof. E. E. Haskins of Northeastern University has been appointed to 
an assistant professorship at the Army Air Forces Institute of Technology, 
Wright Field, Dayton, Ohio. 

Prof. Roger Lessard of the Hull Technical School has accepted a position at 
the Ecole Polytechnique, Montreal. 

Mr. Edward D. Lowery is now a member of the Research Department, Win- 
chester Arms Company, New Haven, Connecticut. 

Professor H. B. Mann of Ohio State University has been awarded the Frank 
Nelson Cole prize in the Theory of Numbers for 1946. 

Dr. Margaret P. Martin has been appointed to an assistant professorship in 
the Department of Preventive Medicine and Public Health, Vanderbilt Uni- 
versity Medical School, Nashville, Tennessee. 

Dr. A. L. O’Toole is at present employed by the Veterans Administration in 
the Washington headquarters, as Acting Chief of the Administrative Analysis 
Division in the Research Service. Dr. O’Toole was released from the Navy on 
September 23, 1946, to inactive duty in the U. 8S. Naval Reserve, with the rank 
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of Commander. Dr. O’Toole served for nearly four years in the Navy, in 
important administrative and statistical work for the Commander South Pacific 
Area and South Pacific Force. He will be remembered as having been with 
Admiral Halsey’s Pacific Fleet, and was awarded the Bronze Star Medal. At 
the time of his release, he was Chief Staff Officer for Commander South Pacific 
Area and South Pacifie Force. 

Mr. I. B. Perrott, since his demobilization from the British Army, has beep 
Lecturer in Mathematics at the College of Technology and Commerce, Leicester, 
England. 

Mr. J. S. Ripandelli is now with the Actuarial Department of the Jefferson 
Standard Life Insurance Company of Greensboro, North Carolina. 

Dr. Ronald W. Shephard of the University of California has been appointed 
to the staff of the Department of Mathematics, New York University. 

Mr. John R. Stehn is now a member of the Research Laboratory of the Gen- 
eral Electric Company, Schenectady, New York. 

Dr. Charles W. Vickery, formerly of Ohio State University, is engaged in work 
as a Research Consultant in New York City. 


Ce I 


Miss Margaret Jeannin Dix, of the University of California Statistical Labora- 
tory, died an accidental death at her home in Berkeley on June 20, 1946. 

Mr. Albert M. Freeman, of the Boston Fiduciary and Research Association, 
died May 20, 1946. 

Dr. Walter chilling, of the Stanford University Hospital, died suddenly in 
San Francisco, December 16, 1946. 


Ce I 


Summer Statistical Session at the University of California at Berkeley 


The important advances in the theory of statistics during the war and espe- 
cially the unprecedented growth in the fields of application have created a 
strong demand for trained statisticians to fill both the research and the teaching 
positions all over the country. Since in many cases the war time education had 
to be somewhat sketchy, unsystematic, and not very conducive to a thorough 
coverage of the vast material, it is felt that a relatively brief set of courses on a 
rather advanced level would be beneficial to many persons, both those who al- 
ready hold research or teaching positions in statistics, 2s well as those who 
prepare for higher degrees. 

With this object in mind, the University of California at Berkeley is offering 
a set of statistical courses during the Summer Session, June 23rd to August 2nd, 
1947. There will be three courses: (i) General Theory of Random Variables and 
Frequency Distributions, by Harald Cramér of the University of Stockholm; 
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(ii) Problems of Testing Hypotheses and of Estimation, by J. Neyman, Univer- 
sity of California, Berkeley; and (iii) Seminar Course. The last will be given by 
seven scholars, each giving two hours of lectures, as follows: 


1. Statistical Astronomy. R. J. TRUMPLER 
2. Orthogonal Polynomials and Problems of Moments. G. SzEc6é 
3. Methods of Calculation. V. F. LENzEN 


(a) Gibbs’ Methods in Statistical Mechanics. 
*  (b) Darwin-Fowler Method of Statistics. 


4. Large Scale Sampling Surveys. P. C. MAHALANOBIS 

5. Statistical Problems Arising in Nuclear Physics R. SERBER 
Measurements. 

6. Problems of Population Genetics. S. EMERSON 

7. Interactions between Industrial Problems and Mathematical H. ScHerré 
Statistics. 


The purpose of the Seminar Course is to introduce the students either to 
branches of pure mathematics contingent. on mathematical statistics but not 
ordinarily taught in the universities or to various fields of knowledge offering 
fruitful fields for statistical studies. 

— 


Summer Statistical Session at Virginia Polytechnic Institute 


A Summer Statistical Session will be held at Virginia Polytechnic Institute, 
Blacksburg, Virginia, August 5 to September 5, 1947. This Session will be 
sponsored jointly by Virginia Polytechnic Institute, University of North Caro- 
lina, University of Michigan, Iowa State College, and the Federal Bureau of 
Agricultural Economics. 

The faculty will consist of; Walter A. Hendricks, B.A.E., U.S.D.A.; Renis 
Likert, University of Michigan; H. L. Lucas, University of North Carolina; 
Maurice G. Kendall, England; George W. Snedecor, Iowa State College; Frank 
Yates, Rothamsted Experiment Station, England; Earl E. Houseman, B.A.E., 
U.S.D.A.; Raymond J. Jessen, Iowa State College, and Boyd Harshbarger, 
Virginia Polytechnic Institute. 

The following courses will be offered for credit: Engineering Statistics; Sta- 
tistical Methods; Design of Animal Experiments; Schedule Design and Interview 
Techniques for Sample Surveys; Sampling Design and Analysis; Mathematical 
Theory of Sampling; Seminar; Mathematical Statistics, and Experimental 
Design. 

In addition to the faculty, probable Seminar speakers are: W. F. Callendar, 
W. G. Cochran, Miss Gertrude M. Cox, W. E. Deming, George Gallup, M. H. 
Hansen, Harold Hotelling, Arnold King, and Charles F. Sarle. 

Inquiries regarding the Summer Session should be addressed to Boyd Harsh- 
barger, Professor of Statistics, Summer Statistical Session, Virginia Polytechnic 
Institute, Blacksburg, Virginia. 
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New Members 


The following persons have been elected to membership in the Institute 
(January 1 to February 28, 1947): 


Asofsky, Samuel, B.S. (C.C.N.Y.) Stat., National Jewish Welfare Board, 1256 E. 13 St. 
Brooklyn 30, N.Y. 

Auer, Richard M., A.M. (Columbia) Instr. in Math., State Teachers Coll., Montclair, 
N.J., 88 No. 16 St., East Orange 

Bakan, David, M.A. (Indiana) Chief Stat., Comm. on Selection and Training of Aircraft 
Pilots, National Research Council, 259 Natatorium, Ohio State Univ., Columbus 10, 
Ohio 

Beatty, Glenn H., A.B. (OhioState) Grad. student and Fellow, lowa State College, Station 
A, General Delivery, Ames, Iowa 

Campbell, Wallace A., B.S. (Columbia) Stat .Analyst, War Assets Administration, 483 
Washington Ave., Brooklyn 16, N. Y. 

Cella, Francis R., M.A. (Kentucky) Assoc. Prof. of Statistics and Director, Bur. of Busi- 
ness Research, Univ. of Oklahoma, Norman, Okla. 

Chapman, Douglas G., M.A. (Toronto) Asst. Prof. of Math., Univ. of British Columbia, 
Vancouver, Canada 

Cheydleur, Benjamin F., B.A. (Wisconsin) Chief, Mechanized Analysis, Naval Ordnance 
Lab., 602 Avenue E, District Heights, Washington 19, D.C. 

Coombs, Clyde H., Ph.D. (Chicago) Ass’t Prof.of Psychology, and Research Psychologist, 
Institute for Human Adjustment, Univ. of Michigan, Ann Arbor, Mich., 1027 E. 
Huron 

Corton, Edward L., Jr., M.B.A. (Chicago) Grad. student, Iowa State Coll., 803 Hodge 
Ave., Ames, Iowa 

Davis, Harold., A.B. (Brooklyn Coll.) Stat., Navy Dept., 4/6—33 St., S.E., Washington, 
poe. 

Dutton, Arthur M., B.S.E.E. (Iowa State) Grad. Fellow, Mathematics Dept., Iowa State 
Coll., Ames, Iowa 

Fay, Edward A., A.M. (Harvard) Grad. student, Univ. of California, Berkeley, 415 South 
17th St., Apt. 2B, Richmond, Calif. 

Flanagan, John C., Ph.D. (Harvard) Prof. of Psychology, Univ. of Pittsburgh, Pitts- 
burgh 13, Pa. 

Gardner, Eric F., Ed.M. (Boston Teachers) Teaching Fellow and Milton Fellow, Grad. 
School of Educ., Harvard Univ., Cambridge, Mass., Walker House, 40 Quincy St. 

Gerende, Lincoln J., C.Ph.M., U.S. Navy, Naval Medical Res. Institute, National Naval 
Medical Center, Bethesda 14, Md. 

Grossman, Evelyn, M.A. (Columbia) Stat., U. S. Dept. of Agriculture, 640/—14 St., 
N.. W., Washington 12, D.C. 

Hill, Edwin A., Jr., M.A. (Columbia) Instr. in Math., Coll. of the City of N. Y., 50 West 
67 St., New York 23, N. Y. 

Horton, H. Burke, M.B.A. (Texas) Senior Transport Analyst, 2906 Naylor Rd., S. E., 
Washington 20, D.C. 

Horvitz, Daniel G., B.S. (Mass. State) Grad. student, Iowa State Coll., 2137 Country Club 
Blvd., Ames, Iowa 

Ikhtiar-ul-Mulk, S. M., M.A. (Punjab, India) Grad. student, Princeton Univ., Graduate 
College, Princeton, N. J. 

Jaeger, Carol M., B.A. (Dubuque) Statistician, 1300 Columbia Terrace, Peoria 6, Ill. 

Jessen, Raymond J., Ph.D. (Iowa State) Res. Assoc. Prof., Iowa State College, and 
Agric. Statistician, U.S.D.A., Statistical Lab., Iowa State Coll., Ames, Iowa 

Kinzer, Mrs. Lydia Greene, M.A. (Kansas) Ass’t Instr. in Math., Ohio State Univ., 
585 East Town Street, Columbus 16, Ohio 
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Langenhop, CarlE., M.S. (lowa State) Instr. in Math., Iowa State Coll., Apt. 3, Cranford 
Annex, Ames, Iowa 

Lowy, Melitta E., A.B. (Hunter) Statistician, Grad. student,-Columbia Univ., 645 West 
End Ave., New York 25, N. Y. 

Mattila, Sakari, Fil.Mag. (Helsinki) High School of Commerce, Helsinki, Finland 

Mayerson, Allen L., B.S. (Michigan) Grad. student and Teaching Fellow, Univ. of Mich., 
1302 Packard St., Ann Arbor, Mich. 

McCreary, Garnet E., M.A. (Queen’s Univ.) Research Fellow, Statistical Lab., Iowa 
State Coll., Ames, Iowa 

McMillan, Olan T., M.A. (Michigan) Instr. in Math., Michigan State Coll., East Lansing, 
Mich. 

Morris, Edward B., A.B. (Indiana) Statistician, U.S. Bur. of Labor Statistics, 1915 Ridge 
Place S. E., Washington 20, D.C. 

Moshman, Jack, B.A. (New York) Tutor in Math., Queens Coll., Flushing, N. Y., 125-09 
liberty Ave., Richmond Hill 19 

Natrella, Mrs. Mary G., B.A. (Pennsylvania) Statistician, Bureau of Ships, Navy Dept., 
1210—12th St., N. W. Washington 5, D.C. 

Neal, T. Ellison, A.B. (Geo. Washington) Statistician, Textile Dev. Dept., U.S. Rubber 
Co., Hogansville, Ga. 

Noble, Carl E., Ph.D. (Iowa) Quality Methods Engineer, Kimberly Clark Corp., Lake- 
view Mill, Neenah, Wis. 

Ostle, Bernard, M.A. (British Columbia) Teaching Ass’t, School of Bus. Adm., Univ. of 

Minnesota, Minneapolis, Minn. 

Oxtoby, Toby E., B.A. (Iowa) Grad. Ass’t, Dept. of Psychology, State Univ. of Iowa, 
Iowa City, Iowa 

Peisakoff, Melvin P., Student, Princeton Univ., 34 North West College, Princeton, N. J. 

Rothschild, Colette, (Ecole Normale Superieure) Attachee de Recherches au Centre Na- 
tional de la Recherche Scientifique, 43 rue Madame, Paris VI*, France 

Slonim, Morris J., M.B.A. (Harvard) Statistician, Bureau of Labor Statistics, 210 Wayne 
Place S. E., Washington 20, D.C. 

Soler, Reuben I., B.B.A. (C.C.N.Y.) Statistician, Food and Drug Administration, 246 
Portland St., S. E., Washington, D.C. 

Stouffer, Samuel A., Ph.D. (Chicago) Prof. of Sociology and Director of the Laboratory 
of Social Relations, Emerson Hall, Harvard Univ., Cambridge, Mass. 

Teicher, Henry, B.A. (Iowa) Graduate student, Columbia Univ., 139 Osborne Terrace, 
Newark, N. J. 

Tiedeman, David V., M.A. (Rochester) Instr. in Educ., Grad. School of Educ., Harvard 
Univ., Walker House, 40 Quincy St., Cambridge 38, Mass. 

Tintner, Gerhard, Ph.D. (Vienna) Prof. of Economics and Mathematics, Iowa State 
Coll., Ames, Iowa 

Weiss, Eleanor S., Ed.M. (Boston Teachers) Teaching Fellow, Grad. School of Educ., 
Harvard Univ., 2005 Commonwealth Ave., Brighton 35, Mass. 

Wilson, William A., Jr., A.B. (California) Teaching Ass’t in Psychology, Univ. of Calif., 
Berkeley 4, Calif. 

Woodell, AllanD., A.B. (N. Y.State Teachers, Albany) Graduate student in math., Univ. 
of Mich., 425 Church St., Ann Arbor, Mich. 
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Omitted from 1946 lists of new members: 


Feraud, Prof. Lucien, Faculte des Sciences Economiques et Sociales, Univ. de Geneve, 
24 rue Henri Mussard, Genéve, Switzerland 





REPORT ON THE ATLANTIC CITY MEETING OF THE INSTITUTE 


The Ninth Annual Meeting of the Institute of Mathematical Statistics was 
held at Atlantic City, New Jersey, on Friday and Saturday, January 24-25, 1947, 
The meeting was held in conjunction with meetings of the American Economic 
Association, American Statistical Association, and the Econometric Society. 
The following 154 members of the Institute attended the meeting: 


Beatrice Aitchison, F. L. Alt, R. L. Anderson, T. W. Anderson, K. J. Arrow, Max Astra- 
chan, B. M. Bennett, Joseph Berkson, A. J. Berman, C. I. Bliss, Paul Boschan, A. E, 
Brandt, M. F. Bresnahan, Philip Brown, O. P. Bruno, R. W. Burgess, O. K. Buros, B. H. 
Camp, F. R. Cella, Uttam Chand, K. L. Chung, C. W. Churchman, P. C. Clifford, W. J. 
Cobb, W. G. Cochran, F. G. Cornell, D. R. Cowan, Harald Cramér, J. H. Curtiss, J. F. Daly, 
G. B. Dantzig, D. G. Deihl, D. B. DeLury, B. W. Dempsey, H. F. Dorn, F. W. Dresch, 
A. J. Duncan, David Durand, P. 8. Dwyer, Churchill Eisenhart, W. D. Evans, Will Feller, 
C. D. Ferris, Irving Fisher, L. R. Frankel, M. A. Geisler, Leon Gilford, M. A. Girshick, 
C. H. Graves, K. E. Greene, S. W. Greenhouse, F. E. Grubbs, E. T. Gumbel, Margaret 
Gurney, Louis Guttman, Trygve Haavelmo, K. W. Halbert, M. H. Hansen, Miriam §. 
Harold, T. E. Harris, Boyd Harshbarger, Bernard Hecht, Wassily Hoeffding, H. B. Horton, 
Harold Hotelling, E. E. Houseman, Helen M. Humes, Leonid Hurwicz, Seymour Jablon, 
R. W. James, R. J. Jessen, H. L. Jones, Alice S. Kaitz, H. B. Kaitz, L. S. Kellogg, H. S. 
Konijn, Tjalling Koopmans, C. F. Kossack, R. L. Kozelka, D. H. Leavens, Howard Levene, 
J. E. Lieberman, Rensis Likert, 8. B. Littauer, Irving Lorge, P. J. McCarthy, P. W. Me- 
Gann, F. E. McIntyre, H. F. MacNeish, J. D. Maddrill, Jacob Marschak, Max Millikan, 
A.M. Mood, Mrs. Margaret Moore, J. W. Morse, J. E. Morton, Frederick Mosteller, D.N. 
Nanda, P. M. Neurath, Jerzy Neyman, M. L. Norden, Nilan Norris, H. W. Norton, P. §. 
Olmstead, E. G. Olds, Sophie Rakesky, Chester Rapkin, Olav Reiersol, W. A. Reynolds, 
P.R. Rider, C. F. Roos, A. C. Rosander, Ernest Rubin, Herman Rubin, P. J. Rulon, Frank 
Saidel, Marion M. Sandomire, Max Sasuly, F. E. Satterthwaite, E. D. Schell, E. M. Schrock, 
D. H. Schwartz, G. R. Seth, L. W. Shaw, W. A. Shewhart, J. H. Smith, R. T. Smith, Leslie 
E. Simon, Milton Sobel, C. M. Stein, G. T. Steinberg, Joseph Steinberg, H. W. Steirhaus, 
F. F. Stephan, A. P. Stergion, M.S. Stevens, G. J. Stigler, S. A. Stouffer, Zenon Szatrowski, 
B. J. Tepping, J. W. Tukey, D. F. Votaw, Jr., Helen M. Walker, J. H. Watkins, Louis 
Weiner, Samuel Weiss, S. 8. Wilks, Elizabeth W. Wilson, C. P. Winsor, J. Wolfowitz, M. A. 
Woodbury, Holbrook Working, C. A. Wright, and T. O. Yntema. 


The first session, a joint session with the Econometric Society and the Bio- 
metrics Section of the American Statistical Association, was held at two o’clock 
on Friday afternoon, and was devoted to the topic, Applications of Statistical 
Techniques to Agricultural Economics. Holbrook Working of Stanford Uni- 
versity presided. The following four papers were presented: 


1. Use of Variance Components in the Analysis of Market Differentials in Hog Prices. 
R. L. Anderson, University of North Carolina. 

2. An Application of the Analysis of Variance in the Economic Evaluation of Production. 
Boyd Harshbarger, Virginia Polytechnic Institute. 

3. A Model of the Economic Interdependence between Agriculture and the National Economy. 
Trygve Haavelmo, Cowles Commission for Research in Economics. 

4. The Reduced-Form Method for Estimating Simultaneous Economic Relationships. 
M. A. Girschick, Bureau of the Census. 
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The session concluded with a discussion of these papers by T. W. Anderson, 
Columbia University; Milton Friedman, University of Chicago; and, Harold 
Hotelling, University of North Carolina. 

At 8 o’clock on Friday evening there was a joint session with the Econometric 
Society and the American Statistical Association, on the topic, When is the 
Analysis of Variance Useful in Economic Research? Arthur R. Tebbutt of 
Northwestern University presided, and the following three papers were presented: 


. The Advantages of the Analysis of Variance for Research and Managerial Control 
Purposes. Harry Pelle Hartkemeier, University of Missouri. 

2. Estimation of Economic Relationships and Multivariate Regression. 
Leonid Hurwicz, Iowa State College. 

3. Nonstandard Forms of Variance Analysis. 
W. Allen Wallis, University of Chicago. 


There was discussion of these papers by Tjalling Koopmans, Cowles Commission 
for Research in Economics: Gerhard Tintner, Iowa State College; and, J. W. 
Tukey, Princeton University. 

At 10 o’clock on Saturday morning there was a joint session with the American 
Statistical Association devoted to the topic, Use of Ordered Observations in 
Statistical Analysis, with Harold Hotelling of the University of North Carolina 
as chairman. The following two papers were presented: 


. Estimation of Parameters by Use of Order Statistics. 
Frederick Mosteller, Harvard University. 

. Tolerance Limits. 
Jacob Wolfowitz, Columbia University. 


There was discussion of these papers by John H. Smith, Bureau of Labor Sta- 
tistics; Howard L. Jones, Illinois Bell Telephone Company; and J. W. Tukey, 
Princeton University. 

At the Saturday morning session one contributed paper of the Institute of 
Mathematical Statistics was also presented, by E. J. Gumbel, Newark College 
of Engineering, on the topic: The Asymptotic Distribution of the Range. 

The Institute’s session at 2 o’clock Saturday afternoon was devoted to con- 
tributed papers. W. G. Cochran, president of the Institute, presided, and the 
following four papers were presented: 


1. A Test of Significance of the Coefficient of Rank Correlation for More than Thirty Ranked 
Items. 
Nilan Norris, Hunter College. 

2. A Generalized T Measure of Multivariate Dispersion. 
Harold Hotelling, University of North Carolina. 

3. Asymptotic Properties of Maximum and Quasi-Mazximum Likelihood Estimates. 
Herman Rubin, Cowles Commission for Research in Economics. 

. The Corner Test for Association. 
J.W. Tukey, Princeton University, and Paul Olmstead, Bell Telephone Laboratories. 
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Abstracts of these papers appear elsewhere in this issue. 

Following the session on contributed papers, Professor Jerzy Neyman of the 
University of California gave an invited address on the topic: On Consistent 
Estimates, with Particular Reference to Structural Relations between Several Vari- 
ables all Subject to Random Error. A discussion of this address followed, by 
Miss E. L. Scott, University of California; A. Wald, Columbia University; and 
Tjalling Koopmans, Cowles Commission for Research in Economics. 

The meeting closed with the annual business meeting of the Institute, which 
was held at 5 p.m. on Saturday in Haddon Hall. Reports by the President, 
Secretary-Treasurer, and Editor were followed by the election of officers for 
1947: Will Feller, President; Morris H. Hansen and John H. Curtiss, Vice- 
Presidents; and Paul 8. Dwyer, Secretary-Treasurer. 

P. S. Dwyer, 
Secretary. 





